Planet SysAdmin


June 19, 2018

Chris Siebenmann

Getting Xorg to let you terminate the X server with Ctrl + Alt + Backspace

This is a saga. You can skip to the end for the actual answer if you're impatient.

Yesterday I wrote about the history of terminating the X server with Ctrl + Alt + Backspace. I've known about this feature for a long time, but I only wind up using it very occasionally, even for Cinnamon on my laptop. This infrequent usage explains how I only recently noticed that it had stopped working on my office machine. When I read the Xorg manpage for another reason recently, I stumbled over the current XKB mechanism. I decided to write a little entry about it, which I decided to save for a day when I was extra tired. Then I decided to do some research first, got some surprises, and wrote yesterday's entry instead.

My initial assumption about why C-A-B wasn't working for me was that the Xorg people had switched it off relatively recently (or changed some magic thing in how you had to turn it on). This 2010 SE question and its answers taught me otherwise; the switch had happened a very long time ago, and I was relatively certain that I had used C-A-B since then on my machines. So what had changed?

These days, the X server is mostly configured through configuration file snippets in a directory; on at least Fedora, this is /etc/X11/xorg.conf.d. In my office workstation's directory, I found a 00-keyboard.conf that dated from the start of 2015 and looked like this:

# Read and parsed by systemd-localed.
# It's probably wise not to edit this
# file manually too freely.
Section "InputClass"
   Identifier "system-keyboard"
   MatchIsKeyboard "on"
   Option "XkbLayout" "us"
   Option "XkbModel" "pc105+inet"
   Option "XkbVariant" "terminate:ctrl_alt_bksp,"
EndSection

I scanned this and said to myself 'well, it's setting the magic XKB option, so something else must be wrong'. I switched to using XKB back at the end of 2015, so at first I though that my setxkbmap usage was overwriting this. However inspection of the manpage told me that I was wrong (the settings are normally merged), and an almost identical 00-keyboard.conf on my home workstation worked with my normal setxkbmap. So yesterday I tiredly posted my history entry and muttered to myself.

This morning, with fresh eyes, I looked at this again and noticed the important thing: this file is setting the XKB keyboard variant, not the XKB options. It should actually be setting "XkbOptions", not "XkbVariant". Since there's no such keyboard variant, this actually did nothing except fool me. I might have noticed the issue if I'd run 'setxkbmap -query', but perhaps not.

All of this leads to the three ways to enable Ctrl + Alt + Backspace termination of the X server, at least on a systemd based system. First, as part of your X session startup you can run setxkbmap to specifically enable C-A-B, among any other XKB changes you're already making:

setxkbmap -option 'compose:rwin' -option 'ctrl:nocaps' -option 'terminate:ctrl_alt_bksp'

Second, you can manually create or edit a configuration file snippet in /etc/X11/xorg.conf.d or your equivalent to specify this. If you already have a 00-keyboard.conf or the equivalent, the option you want is:

Option "XkbOptions" "terminate:ctrl_alt_bksp"

(A trailing comma is okay, apparently.)

Third, if you have Fedora or perhaps any systemd-based distribution, you can configure this the official way by running localectl with a command like this:

localectl --no-convert set-x11-keymap us pc105+inet "" terminate:ctrl_alt_bksp

There is a bear trap lurking here. That innocent looking "" is very important, as covered in the Arch wiki page. As they write (with my emphasis):

To set a model, variant, or options, all preceeding fields need to be specified, but the preceding fields can be skipped by passing an empty string with "". [...]

Given that my original xorg.conf snippet had what should be the XKB options as the XKB variant, it seems very likely that back in January 2015, something ran localectl and left out that all important "".

(That I didn't really notice for a bit more than three years shows some mixture of how little I use C-A-B and how willing I am to shrug and ignore minor mysteries involving keyboards and X.)

My laptop had been set up and maintained as a stock Fedora machine; these days that apparently means that this option isn't enabled in the xorg.conf stuff. Unlike on my workstation (where I edited 00-keyboard.conf directly), I did it the official way through localectl. I determined the other command line parameters by looking at the existing 00-keyboard.conf; I believe that on the laptop, the model (the 'pc105+inet' bit) was blank, as was the variant.

Sidebar: How my machines got to their Xorg keyboard state

I assume that before that early 2015 change, my office workstation's Xorg configuration had the magic XkbOptions setting that made it work. I'm pretty sure that C-A-B worked at some point since 2010 or 2011 or so. My home machine has a 00-keyboard.conf from October 2011, which is about when I installed Fedora on it, with comments that say it was created by system-setup-keyboard, and that has the necessary XkbOptions setting. My office machine's Fedora install dates to 2006, so it might have had any number of configuration oddities that confused things at some point.

(My home machine got a completely new Fedora 15 install in 2011 as part of digging myself out of my Fedora 8 hole. My office workstation never got stuck on an older Fedora release the way my home machine did, so the Fedora install's never been rebuilt from scratch. Sometimes I get vaguely tempted by the idea of a from-scratch rebuild, but then I get terrified of how much picky work it would be just to get back to where I am now.)

by cks at June 19, 2018 12:56 PM

A broad overview of how modern Linux systems boot

For reasons beyond the scope of this entry, today I feel like writing down a broad and simplified overview of how modern Linux systems boot. Due to being a sysadmin who has stubbed his toe here repeatedly, I'm going to especially focus on points of failure.

  1. The system loads and starts the basic bootloader somehow, through either BIOS MBR booting or UEFI. This can involve many steps on its own and any number of things can go wrong, such as unsigned UEFI bootloaders on a Secure Boot system. Generally these failures are the most total; the system reports there's nothing to boot, or it repeatedly reboots, or the bootloader aborts with what is generally a cryptic error message.

    On a UEFI system, the bootloader needs to live in the EFI system partition, which is always a FAT32 filesystem. Some people have had luck making this a software RAID mirror with the right superblock format; see the comments on this entry.

  2. The bootloader loads its configuration file and perhaps additional modules from somewhere, usually your /boot but also perhaps your UEFI system partition. Failures here can result in extremely cryptic errors, dropping you into a GRUB shell, or ideally a message saying 'can't find your menu file'. The configuration file location is usually hardcoded, which is sometimes unfortunate if your distribution has picked a bad spot.

    For GRUB, this spot has to be on a filesystem and storage stack that GRUB understands, which is not necessarily the same as what your Linux kernel understands. Fortunately GRUB understands a lot these days, so under normal circumstances you're unlikely to run into this.

    (Some GRUB setups have a two stage configuration file, where the first stage just finds and loads the second one. This allows you more flexibility in where the second stage lives, which can be important on UEFI systems.)

  3. Using your configuration file, the bootloader loads your chosen Linux kernel and an initial ramdisk into memory and transfers control to the kernel. The kernel and initramfs image also need to come from a filesystem that your bootloader understands, but with GRUB the configuration file allows you to be very flexible about how they're found and where they come from (and it doesn't have to be the same place as grub.cfg is, although on a non-UEFI system both are usually in /boot).

    There are two things that can go wrong here; your grub.cfg can have entries for kernels that don't exist any more, or GRUB can fail to locate and bring up the filesystem where the kernel(s) are stored. The latter can happen if, for example, your grub.cfg has the wrong UUIDs for your filesystems. It's possible to patch this up on the fly so you can boot your system.

  4. The kernel starts up, creates PID 1, and runs /init from the initramfs as PID 1. This process and things that it run then flail around doing various things, with the fundamental goal of finding and mounting your real root filesystem and transferring control to it. In the process of doing this it will try to assemble software RAID devices and other storage stuff like LVM, perhaps set sysctls, and so on. The obvious and traditional failure mode here is that the initramfs can't find or mount your root filesystem for some reason; this usually winds up dropping you into some sort of very minimal rescue shell. If this happens to you, you may want to boot from a USB live image instead; they tend to have more tools and a better environment.

    (Sometimes the reasons for failure are obscure and annoying.)

    On many traditional systems, the initramfs /init was its own separate thing, often a shell script, and was thus independent from and different from your system's real init. On systemd based systems, the initramfs /init is actually systemd itself and so even very early initramfs boot is under systemd's control. In general, a modern initramfs is a real (root) filesystem that processes in the initramfs will see as /, and its contents (both configuration files and programs) are usually copied from the versions in your root filesystem. You can inspect the whole thing with lsinitrd or lsinitramfs.

    Update: It turns out that the initramfs init is still a shell script in some Linux distributions, prominently Debian and Ubuntu. The initramfs init being systemd may be a Red Hat-ism (Fedora and RHEL). Thanks to Ben Hutchings in the comments for the correction.

    How the initramfs /init pivots into running your real system's init daemon on your real system's root filesystem is beyond the scope of this entry. The commands may be simple (systemd just runs 'systemctl switch-root'), but how they work is complicated.

    (That systemd is the initramfs /init is convenient in a way, because it means that you don't need to learn an additional system to inspect how your initramfs works; instead you can just look at the systemd units included in the initramfs and follow along in the systemd log.)

  5. Your real init system starts up to perform basic system setup to bring the system to a state that we think of as the normal basic way it is; basically, this is everything you usually get if you boot into a modern single user mode. This does things like set the hostname, mount the root filesystem so it can be written to, apply your sysctl settings (from the real root filesystem this time), configure enough networking so that you have a loopback device and the IPv4 and IPv6 localhost addresses, have udev fiddle around with hardware, and especially mount all of your local filesystems (which includes activating underlying storage systems like software RAID and LVM, if they haven't been activated already in the initramfs).

    The traditional thing that fails here is that one or more of your local filesystems can't be mounted. This often causes this process to abort and drop you into a single user rescue shell environment.

    (On a systemd system the hostname is actually set twice, once in the initramfs and then again in this stage.)

  6. With your local filesystems mounted and other core configuration in place, your init system continues on to boot your system the rest of the way. This does things like configure your network (well, perhaps; these days some systems may defer it until you log in), start all of the system's daemons, and eventually enable logins on text consoles and perhaps start a graphical login environment like GDM or LightDM. At the end of this process, your system is fully booted.

    Things that fail here are problems like a daemon not starting or, more seriously, the system not finding the network devices it expects and so not getting itself on the network at all. Usually the end result is that you still wind up with a login prompt (either a text console or graphics), it's just that there were error messages (which you may not have seen) or some things aren't working. Very few modern systems abort the boot and drop into a rescue environment for failures during this stage.

    On a systemd system, this transfers control from the initramfs systemd to the systemd binary on your root filesystem (which takes over as PID 1), but systemd maintains continuity of its state and boot process and you can see the whole thing in journalctl. The point where the switch happens is reported as 'Starting Switch Root...' and then 'Switching root.'

All of System V init, Upstart, and systemd have this distinction between the basic system setup steps and the later 'full booting' steps, but they implement it in different ways. Systemd doesn't draw a hard distinction between the two phases and you can shim your own steps into either portion in basically the same way. System V init tended to implement the early 'single user' stage as a separate nominal runlevel, runlevel 'S', that the system transitioned through on the way to its real target runlevel. Upstart is sort of a hybrid; it has a startup event that's emitted to trigger a number of things before things start fully booting.

(This really is an overview. Booting Linux on PC hardware has become a complicated process at the best of times, with a lot of things to set up and fiddle around with.)

by cks at June 19, 2018 01:11 AM

June 18, 2018

Steve Kemp's Blog

Monkeying around with intepreters - Result

So I challenged myself to writing a BASIC intepreter over the weekend, unfortunately I did not succeed.

What I did was take an existing monkey-repl and extend it with a series of changes to make sure that I understood all the various parts of the intepreter design.

Initially I was just making basic changes:

  • Added support for single-line comments.
    • For example "// This is a comment".
  • Added support for multi-line comments.
    • For example "/* This is a multi-line comment */".
  • Expand \n and \t in strings.
  • Allow the index operation to be applied to strings.
    • For example "Steve Kemp"[0] would result in S.
  • Added a type function.
    • For example "type(3.13)" would return "float".
    • For example "type(3)" would return "integer".
    • For example "type("Moi")" would return "string".

Once I did that I overhauled the built-in functions, allowing callers to register golang functions to make them available to their monkey-scripts. Using this I wrote a simple "standard library" with some simple math, string, and file I/O functions.

The end result was that I could read files, line-by-line, or even just return an array of the lines in a file:

 // "wc -l /etc/passwd" - sorta
 let lines = file.lines( "/etc/passwd" );
 if ( lines ) {
    puts( "Read ", len(lines), " lines\n" )
 }

Adding file I/O was pretty neat, although I only did reading. Handling looping over a file-contents is a little verbose:

 // wc -c /etc/passwd, sorta.
 let handle = file.open("/etc/passwd");
 if ( handle < 0 ) {
   puts( "Failed to open file" )
 }

 let c = 0;       // count of characters
 let run = true;  // still reading?

 for( run == true ) {

    let r = read(handle);
    let l = len(r);
    if ( l > 0 ) {
        let c = c + l;
    }
    else {
        let run = false;
    }
 };

 puts( "Read " , c, " characters from file.\n" );
 file.close(handle);

This morning I added some code to interpolate hash-values into a string:

 // Hash we'll interpolate from
 let data = { "Name":"Steve", "Contact":"+358449...", "Age": 41 };

 // Expand the string using that hash
 let out = string.interpolate( "My name is ${Name}, I am ${Age}", data );

 // Show it worked
 puts(out + "\n");

Finally I added some type-conversions, allowing strings/floats to be converted to integers, and allowing other value to be changed to strings. With the addition of a math.random function we then got:

 // math.random() returns a float between 0 and 1.
 let rand = math.random();

 // modify to make it from 1-10 & show it
 let val = int( rand * 10 ) + 1 ;
 puts( "math.random() -> ", val , "\n");

The only other signification change was the addition of a new form of function definition. Rather than defining functions like this:

 let hello = fn() { puts( "Hello, world\n" ) };

I updated things so that you could also define a function like this:

 function hello() { puts( "Hello, world\n" ) };

(The old form still works, but this is "clearer" in my eyes.)

Maybe next weekend I'll try some more BASIC work, though for the moment I think my monkeying around is done. The world doesn't need another scripting language, and as I mentioned there are a bunch of implementations of this around.

The new structure I made makes adding a real set of standard-libraries simple, and you could embed the project, but I'm struggling to think of why you would want to. (Though I guess you could pretend you're embedding something more stable than anko and not everybody loves javascript as a golang extension language.)

June 18, 2018 09:01 AM

Chris Siebenmann

The history of terminating the X server with Ctrl + Alt + Backspace

If your Unix machine is suitably configured, hitting Ctrl + Alt + Backspace will immediately terminate the X server, or more accurately will cause the X server to immediately exit. This is an orderly exit from the server's perspective (it will do things like clean up the graphics state), but an abrupt one for clients; the server just closes their connections out of the blue. It turns out that the history of this feature is a bit more complicated than I thought.

Once upon a time, way back when, there was the X releases from the (MIT) X Consortium. These releases came with a canonical X server, with support for various Unix workstation hardware. For a long time, the only way to get this server to terminate abruptly was to sent it a SIGINT or SIGQUIT signal. In X11R4, which I believe was released in 1989, IBM added a feature to the server drivers for their hardware (and thus to the X server that would run on their AIX workstations); if you hit Control, Alt, and Backspace, the server would act as if it had received a SIGINT signal and immediately exit.

(HP Apollo workstations also would immediately exit the X server if you hit the 'Abort/Exit' key that they had on their custom keyboard, but I consider this a different sort of thing since it's a dedicated key.)

In X11R5, released in 1991, two things happened. First, IBM actually documented this key sequence in server/ddx/ibm/README (previously it was only mentioned in the server's IBM-specific usage messages). Second, X386 was included in the release, and its X server hardware support also contained a Ctrl + Alt + Backspace 'terminate the server' feature. This feature was carried on into XFree86 and thus the version of the X server that everyone ran on Linux and the *BSDs. The X386 manpage documents it this way:

Ctrl+Alt+Backspace
Immediately kills the server -- no questions asked. (Can be disabled by specifying "dontzap" in the configuration file.)

I never used IBM workstations, so my first encounter with this was with X on either BSDi or Linux. I absorbed it as a PC X thing, one that was periodically handy for various reasons (for instance, if my session got into a weird state and I just wanted to yank the rug out from underneath it and start again).

For a long time, XFree86/Xorg defaulted to having this feature on. Various people thought that this was a bad idea, since it gives people an obscure gun to blow their foot off with, and eventually these people persuaded the Xorg people to change the default. In X11R7.5, released in October of 2009, Xorg changed things around so that C-A-B would default to off in a slightly tricky way and that you would normally use an XKB option to control this; see also the Xorg manpage.

(You can set this option by hand with setxkbmap, or your system may have an xorg.conf.d snippet that sets this up automatically. Note that running setxkbmap by hand normally merges your changes with the system settings; see its manpage.)

Sidebar: My understanding of how C-A-B works today

In the original X386 implementation (and the IBM one), the handling of C-A-B was directly hard-coded in the low level keyboard handling. If the code saw Backspace while Ctrl and Alt were down, it called the generic server code's GiveUp() function (which was also connected to SIGINT and SIGQUIT) and that was that.

In modern Xorg X with XKB, there's a level of indirection involved. The server has an abstracted Terminate_Server event (let's call it that) that triggers the X server exiting, and in order to use it you need to map some actual key combination to generate this event. The most convenient way to do this is through setxkbmap, provided that all you want is the Ctrl + Alt + Backspace combination, but apparently you can do this with xmodmap too and you'll probably have to do that if you want to invoke it through some other key combination.

The DontZap server setting still exists and still defaults to on, but what it controls today is whether or not the server will pay attention to a Terminate_Server event if you generate one. This is potentially useful if you want to not just disable C-A-B by default but also prevent people from enabling it at all.

I can see why the Xorg people did it this way and why it makes sense, but it does create extra intricacy.

by cks at June 18, 2018 03:53 AM

June 17, 2018

Errata Security

Notes on "The President is Missing"

Former president Bill Clinton has contributed to a cyberthriller "The President is Missing", the plot of which is that the president stops a cybervirus from destroying the country. This is scary, because people in Washington D.C. are going to read this book, believe the hacking portrayed has some basis in reality, and base policy on it. This "news analysis" piece in the New York Times is a good example, coming up with policy recommendations based on fictional cliches rather than a reality of what hackers do.


The cybervirus in the book is some all powerful thing, able to infect everything everywhere without being detected. This is fantasy no more real than magic and faeries. Sure, magical faeries is a popular basis for fiction, but in this case, it's lazy fantasy, a cliche. In fiction, viruses are rarely portrayed as anything other than all powerful.

But in the real world, viruses have important limitations. If you knew anything about computer viruses, rather than being impressed by what they can do, you'd be disappointed by what they can't.

Go look at your home router. See the blinky lights. The light flashes every time a packet of data goes across the network. Packets can't be sent without a light blinking. Likewise, viruses cannot spread themselves over a network, or communicate with each other, without somebody noticing -- especially a virus that's supposedly infected a billion devices as in the book.

The same is true of data on the disk. All the data is accounted for. It's rather easy for professionals to see when data (consisting of the virus) has been added. The difficulty of anti-virus software is not in detecting when something new has been added to a system, but automatically determining whether it's benign or malicious. When viruses are able to evade anti-virus detection, it's because they've been classified as non-hostile, not because they are invisible.

Such evasion only works when hackers have a focused target. As soon as a virus spreads too far, anti-virus companies will get a sample, classify as malicious, and spread the "signatures" out to the world. That's what happened with Stuxnet, a focused attack on Iran's nuclear enrichment program that eventually spread too far and got detected. It's implausible that anything can spread to a billion systems without anti-virus companies getting a sample and correctly classifying it.

In the book, the president creates a team of the 30 brightest cybersecurity minds the country has, from government, the private sector, and even convicted hackers on parole from jail -- each more brilliant than the last. This is yet another lazy cliche about genius hackers.

The cliche comes from the fact that it's rather easy to impress muggles with magic tricks. As soon as somebody shows an ability to do something you don't know how to do, they become a cyber genius in your mind. The reality is that cybersecurity/hacking is no different than any other profession, no more dominated by "genius" than bridge engineering or heart surgery. It's a skill that takes both years of study as well as years of experience.

So whenever the president, ignorant of computers, puts together a team of 30 cyber geniuses, they aren't going to be people of competence. They are going to be people good at promoting themselves, taking credit for other people's work, or political engineering. They won't be technical experts, they'll be people like Rudi Giuliani or Richard Clarke, who have been tapped by presidents as cyber experts despite knowing less than nothing about computers.

A funny example of this is Marcus Hutchins. He's a virus researcher of typical skill and experience, but was catapulted to fame by finding the "kill switch" in the famous Wannacry virus. In truth, he just got lucky, being just the first to find the kill switch that would've soon been found by another researcher (it was pretty obvious). But the press set him up as one of the top 5 experts in the world. That's silly, because there is no such thing, like there's no "top 5 neurosurgeons" or "top 5 bridge engineers". Hutchins is certainly skilled enough to merit a solid 6 figure salary, but such "top cyber geniuses" don't exist.

I mention Hutchins because months after the famed Wannacry incident, he was arrested in conjunction with an unrelated Russian banking virus. Assuming everything in his indictment is true, it still makes him only a minor figure with a few youthful indiscretions. It's likely this confusion between "fame" and "cyber genius" catapulted him into being a major person of interest in their investigations.

The book discusses the recent major cyberattacks in the news, like Mirai, Wannacry, and nPetya, but they are distorted misunderstandings of what happened. For example, it explains DDoS:
A DDoS attack is a distribute denial-of-service attack. A flood attack, essentially, on the network of servers that convert the addresses we type into our browsers into IP numbers that the internet routers use.
This is only partial right, but mainly wrong. DDoS is any sort of flood from multiple sources distributed around the Internet, against any target. It's only the Mirai attack, the most recent famous DDoS, that attacked the name servers that convert addresses to numbers.

The same sort of misconceptions are rife in Washington. Mirai, Wannacry, and nPetya spawned a slew of policy recommendations that get the technical details wrong. Politicians reading this Clinton thriller will just get more wrong.


In terms of fiction, the lazy cliches and superficial understand of cybersecurity will be hard for people of intelligence to stomach. However, the danger I want to point out is that people in Washington D.C., politicians who make policy, will read this book. Their understanding of how cyber works will come from such books. And it will be wrong.



by Robert Graham (noreply@blogger.com) at June 17, 2018 11:20 PM

June 16, 2018

Steve Kemp's Blog

Monkeying around with intepreters

Recently I've had an overwhelming desire to write a BASIC intepreter. I can't think why, but the idea popped into my mind, and wouldn't go away.

So I challenged myself to spend the weekend looking at it.

Writing an intepreter is pretty well-understood problem:

  • Parse the input into tokens, such as "LET", "GOTO", "INT:3"
    • This is called lexical analysis / lexing.
  • Taking those tokens and building an abstract syntax tree.
    • The AST
  • Walking the tree, evaluating as you go.
    • Hey ho.

Of course BASIC is annoying because a program is prefixed by line-numbers, for example:

 10 PRINT "HELLO, WORLD"
 20 GOTO 10

The naive way of approaching this is to repeat the whole process for each line. So a program would consist of an array of input-strings each line being treated independently.

Anyway reminding myself of all this fun took a few hours, and during the course of that time I came across Writing an intepreter in Go which seems to be well-regarded. The book walks you through creating an interpreter for a language called "Monkey".

I found a bunch of implementations, which were nice and clean. So to give myself something to do I started by adding a new built-in function rnd(). Then I tested this:

let r = 0;
let c = 0;

for( r != 50 ) {
   let r = rnd();
   let c = c + 1;
}

puts "It took ";
puts c;
puts " attempts to find a random-number equalling 50!";

Unfortunately this crashed. It crashed inside the body of the loop, and it seemed that the projects I looked at each handled the let statement in a slightly-odd way - the statement wouldn't return a value, and would instead fall-through a case statement, hitting the next implementation.

For example in monkey-intepreter we see that happen in this section. (Notice how there's no return after the env.Set call?)

So I reported this as a meta-bug to the book author. It might be the master source is wrong, or might be that the unrelated individuals all made the same error - meaning the text is unclear.

Anyway the end result is I have a language, in go, that I think I understand and have been able to modify. Now I'll have to find some time to go back to BASIC-work.

I found a bunch of basic-intepreters, including ubasic, but unfortunately almost all of them were missing many many features - such as implementing operations like RND(), ABS(), COS().

Perhaps room for another interpreter after all!

June 16, 2018 11:01 AM

June 15, 2018

Everything Sysadmin

Google has changed GSuite's SPF records

(DNSControl unrolls your SPF records safely and automatically. Sure, you can do it manually, but at the end of this article I'll show you how to automate it so you can 'set it and forget it'.)

Google has changed the SPF records for GSuite. You don't have to make any changes since you still do include:_spf.google.com and the magic of SPF takes care of things for you.

However, if you unroll your SPF records to work around the 10-lookup limit, you need to take a look at what you've done and re-do it based on the new SPF records.

The change is simple: They've added two new CIDR blocks (ip4:35.191.0.0/16 ip4:130.211.0.0/22) to _netblocks3.google.com

99.99% of all Gsuite users don't have to do anything.

by Tom Limoncelli at June 15, 2018 09:00 PM

Sean's IT Blog

Getting Started With UEM Part 2: Laying The Foundation – File Services

In my last post on UEM, I discussed the components and key considerations that go into deploying VMware UEM.  UEM is made up of multiple components that rely on a common infrastructure of file shares and Group Policy to manage the user environment, and in this post, we will cover how to deploy the file share infrastructure.

There are two file shares that we will be deploying.  These file shares are:

  • UEM Configuration File Share
  • UEM User Data Share

Configuration File Share

The first of the two UEM file shares is the configuration file share.  This file share holds the configuration data used by the UEM agent that is installed in the virtual desktops or RDSH servers.

The UEM configuration share contains a few important subfolders.  These subfolders are created by the UEM management console during it’s initial setup, and they align with various tabs in the UEM Management Console.  We will discuss this more in a future article on using the UEM Management console.

  • General – This is the primary subfolder on the configuration share, and it contains the main configuration files for the agent.
  • FlexRepository – This subfolder under General contains all of the settings configured on the “User Environment” tab.  The settings in this folder tell the UEM agent how to configure policies such as Application Blocking, Horizon Smart Policies, and ADMX-based settings.

Administrators can create their own subfolders for organizing application and Windows  personalization.  These are created in the user personalization tab, and when a folder is created in the UEM Management Console, it is also created on the configuration share.  Some folders that I use in my environment are:

  • Applications – This is the first subfolder underneath the General folder.  This folder contains the INI files that tell the UEM agent how to manage application personalization.  The Applications folder makes up one part of the “Personalization” tab.
  • Windows Settings – This folder contains the INI files that tell the UEM agent how to manage the Windows environment personalization.  The Windows Settings folder makes up the other part of the Personalization tab.

Some environments are a little more complex, and they require additional configuration sets for different use cases.  UEM can create a silo for specific settings that should only be applied to certain users or groups of machines.  A silo can have any folder structure you choose to set up – it can be a single application configuration file or it can be an entire set of configurations with multiple sub-folders.  Each silo also requires its own Group Policy configuration.

User Data File Share

The second UEM file share is the user data file share.  This file share holds the user data that is managed by UEM.  This is where any captured application profiles are stored. It can also contain other user data that may not be managed by UEM such as folders managed by Windows Folder Redirection.  I’ve seen instances where the UEM User Data Share also contained other data to provide a single location where all user data is stored.

The key thing to remember about this share is that it is a user data share.  These folders belong to the user, and they should be secured so that other users cannot access them.  IT administrators, system processes such as antivirus and backup engines, and, if allowed by policy, the helpdesk should also have access to these folders to support the environment.

User application settings data is stored on the share.  This consists of registry keys and files and folders from the local user profile.  When this data is captured by the UEM agent, it is compressed in a zip file before being written out to the network.  The user data folder also can contain backup copies of user settings, so if an application gets corrupted, the helpdesk or the user themselves can easily roll back to the last configuration.

UEM also allows log data to be stored on the user data share.  The log contains information about activities that the UEM agent performs during logon, application launch and close, and logoff, and it provides a wealth of troubleshooting information for administrators.

UEM Shared Folder Replication

VMware UEM is perfect for multi-site end-user computing environments because it only reads settings and data at logon and writes back to the share at user logoff.  If FlexDirect is enabled for applications, it will also read during an application launch and write back when the last instance of the application is closed.  This means that it is possible to replicate UEM data to other file shares, and the risk of file corruption is minimized due to file locks being minimized.

Both the UEM Configuration Share and the UEM User Data share can be replicated using various file replication technologies.

DFS Namespaces

As environments grow or servers are retired, this UEM data may need to be moved to new locations.  Or it may need to exist in multiple locations to support multiple sites.  In order to simplify the configuration of UEM and minimize the number of changes that are required to Group Policy or other configurations, I recommend using DFS Namespaces to provide a single namespace for the file shares.  This allows users to use a single path to access the file shares regardless of their location or the servers that the data is located on.

UEM Share Permissions

It’s not safe assume that everyone is using Windows-based file servers to provide file services in their environment.  Because of that, setting up network shares is beyond the scope of this post.  The process of creating the share and applying security varies based on the device hosting the share.

The required Share and NTFS/File permissions are listed in the table below. These contain the basic permissions that are required to use UEM.  The share permissions required for the HelpDesk tool are not included in the table.

Share Share Permissions NTFS Permissions
UEMConfiguration Administrators: Full Control

UEM Admins: Change

Authenticated Users: Read

Administrators: Full Control

UEM Admins: Full Control

Authenticated Users: Read and Execute

UserData Administrators: Full Control

UEM Admins: Full Control

Authenticated Users: Change

Administrators: Full Control

UEM Admins: Full Control

Authenticated Users (This folder Only):

Read and Execute

Create Folders/Append Data

Creator Owner (Subfolders and files only):

Full Control

Wrapup and Next Steps

This post just provided a basic overview of the required UEM file shares and user permissions.  If you’re planning to do a multi-site environment or have multiple servers, this would be a good time to configure replication.

The next post in this series will cover the setup and initial configuration of the UEM management infrastructure.  This includes setting up the management console and configuring Group Policy.

by seanpmassey at June 15, 2018 01:03 PM

June 14, 2018

Raymii.org

snap install mosaic, the first graphical webbrowser on Ubuntu

On one of my favorite podcasts from Jupiter Broadcasting, either Linux Action News or Linux unplugged (252 I think, not sure), Allan Pope was talking about Snap packages and how there are now WinePacks, a snap with Wine and a single (Windows) application packaged. During the discussion he dropped that Mosaic, the first graphical web browser, is available as a snap package, for modern distributions. I installed it, after huge download (1.5 MB), playing around with it is quite fun. In this post I'll discuss how to install it, what works and what doens't in the modern age on Ubuntu 18.04

June 14, 2018 12:00 AM

June 13, 2018

Evaggelos Balaskas

Terraform Gandi

This blog post, contains my notes on working with Gandi through Terraform. I’ve replaced my domain name with: example.com put pretty much everything should work as advertised.

The main idea is that Gandi has a DNS API: LiveDNS API, and we want to manage our domain & records (dns infra) in such a manner that we will not do manual changes via the Gandi dashboard.

 

Terraform

Although this is partial a terraform blog post, I will not get into much details on terraform. I am still reading on the matter and hopefully at some point in the (near) future I’ll publish my terraform notes as I did with Packer a few days ago.

 

Installation

Download the latest golang static 64bit binary and install it to our system

$ curl -sLO https://releases.hashicorp.com/terraform/0.11.7/terraform_0.11.7_linux_amd64.zip
$ unzip terraform_0.11.7_linux_amd64.zip
$ sudo mv terraform /usr/local/bin/

 

Version

Verify terraform by checking the version

$ terraform version
Terraform v0.11.7

 

Terraform Gandi Provider

There is a community terraform provider for gandi: Terraform provider for the Gandi LiveDNS by Sébastien Maccagnoni (aka tiramiseb) that is simple and straightforward.

 

Build

To build the provider, follow the notes on README

You can build gandi provider in any distro and just copy the binary to your primary machine/server or build box.
Below my personal (docker) notes:

$  mkdir -pv /root/go/src/
$  cd /root/go/src/

$  git clone https://github.com/tiramiseb/terraform-provider-gandi.git 

Cloning into 'terraform-provider-gandi'...
remote: Counting objects: 23, done.
remote: Total 23 (delta 0), reused 0 (delta 0), pack-reused 23
Unpacking objects: 100% (23/23), done.

$  cd terraform-provider-gandi/

$  go get
$  go build -o terraform-provider-gandi

$  ls -l terraform-provider-gandi
-rwxr-xr-x 1 root root 25788936 Jun 12 16:52 terraform-provider-gandi

Copy terraform-provider-gandi to the same directory as terraform binary.

 

Gandi API Token

Login into your gandi account, go through security

Gandi Security

and retrieve your API token

Gandi Token

The Token should be a long alphanumeric string.

 

Repo Structure

Let’s create a simple repo structure. Terraform will read all files from our directory that ends with .tf

$ tree
.
├── main.tf
└── vars.tf
  • main.tf will hold our dns infra
  • vars.tf will have our variables

 

Files

vars.tf

variable "gandi_api_token" {
    description = "A Gandi API token"
}

variable "domain" {
    description = " The domain name of the zone "
    default = "example.com"
}

variable "TTL" {
    description = " The default TTL of zone & records "
    default = "3600"
}

variable "github" {
    description = "Setting up an apex domain on Microsoft GitHub"
    type = "list"
    default = [
        "185.199.108.153",
        "185.199.109.153",
        "185.199.110.153",
        "185.199.111.153"
    ]
}

 

main.tf

# Gandi
provider "gandi" {
  key = "${var.gandi_api_token}"
}

# Zone
resource "gandi_zone" "domain_tld" {
    name = "${var.domain} Zone"
}

# Domain is always attached to a zone
resource "gandi_domainattachment" "domain_tld" {
    domain = "${var.domain}"
    zone = "${gandi_zone.domain_tld.id}"
}

# DNS Records

resource "gandi_zonerecord" "mx" {
  zone = "${gandi_zone.domain_tld.id}"
  name = "@"
  type = "MX"
  ttl = "${var.TTL}"
  values = [ "10 example.com."]
}

resource "gandi_zonerecord" "web" {
  zone = "${gandi_zone.domain_tld.id}"
  name = "web"
  type = "CNAME"
  ttl = "${var.TTL}"
  values = [ "test.example.com." ]
}

resource "gandi_zonerecord" "www" {
  zone = "${gandi_zone.domain_tld.id}"
  name = "www"
  type = "CNAME"
  ttl = "${var.TTL}"
  values = [ "${var.domain}." ]
}

resource "gandi_zonerecord" "origin" {
  zone = "${gandi_zone.domain_tld.id}"
  name = "@"
  type = "A"
  ttl = "${var.TTL}"
  values = [ "${var.github}" ]
}

 

Variables

By declaring these variables, in vars.tf, we can use them in main.tf.

  • gandi_api_token - The Gandi API Token
  • domain - The Domain Name of the zone
  • TTL - The default TimeToLive for the zone and records
  • github - This is a list of IPs that we want to use for our site.

 

Main

Our zone should have four DNS record types. The gandi_zonerecord is the terraform resource and the second part is our local identifier. Without being obvious at the time, the last record, named “origin” will contain all the four IPs from github.

  • gandi_zonerecord” “mx”
  • gandi_zonerecord” “web”
  • gandi_zonerecord” “www”
  • gandi_zonerecord” “origin”

 

Zone

In other (dns) words , the state of our zone should be:

example.com.        3600    IN    MX       10 example.com
web.example.com.    3600    IN    CNAME    test.example.com.
www.example.com.    3600    IN    CNAME    example.com.
example.com.        3600    IN    A        185.199.108.153
example.com.        3600    IN    A        185.199.109.153
example.com.        3600    IN    A        185.199.110.153
example.com.        3600    IN    A        185.199.111.153

 

Environment

We haven’t yet declared anywhere in our files the gandi api token. This is by design. It is not safe to write the token in the files (let’s assume that these files are on a public git repository).

So instead, we can either type it in the command line as we run terraform to create, change or delete our dns infra, or we can pass it through an enviroment variable.

export TF_VAR_gandi_api_token="XXXXXXXX"

 

Verbose Logging

I prefer to have debug on, and appending all messages to a log file:

export TF_LOG="DEBUG"
export TF_LOG_PATH=./terraform.log

 

Initialize

Ready to start with our setup. First things first, lets initialize our repo.

terraform init

the output should be:

Initializing provider plugins...

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

 

Planning

Next thing , we have to plan !

terraform plan

First line is:

Refreshing Terraform state in-memory prior to plan...

the rest should be:

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  + gandi_domainattachment.domain_tld
      id:                <computed>
      domain:            "example.com"
      zone:              "${gandi_zone.domain_tld.id}"

  + gandi_zone.domain_tld
      id:                <computed>
      name:              "example.com Zone"

  + gandi_zonerecord.mx
      id:                <computed>
      name:              "@"
      ttl:               "3600"
      type:              "MX"
      values.#:          "1"
      values.3522983148: "10 example.com."
      zone:              "${gandi_zone.domain_tld.id}"

  + gandi_zonerecord.origin
      id:                <computed>
      name:              "@"
      ttl:               "3600"
      type:              "A"
      values.#:          "4"
      values.1201759686: "185.199.109.153"
      values.226880543:  "185.199.111.153"
      values.2365437539: "185.199.108.153"
      values.3336126394: "185.199.110.153"
      zone:              "${gandi_zone.domain_tld.id}"

  + gandi_zonerecord.web
      id:                <computed>
      name:              "web"
      ttl:               "3600"
      type:              "CNAME"
      values.#:          "1"
      values.921960212:  "test.example.com."
      zone:              "${gandi_zone.domain_tld.id}"

  + gandi_zonerecord.www
      id:                <computed>
      name:              "www"
      ttl:               "3600"
      type:              "CNAME"
      values.#:          "1"
      values.3477242478: "example.com."
      zone:              "${gandi_zone.domain_tld.id}"

Plan: 6 to add, 0 to change, 0 to destroy.

so the plan is Plan: 6 to add !

 

State

Let’s get back to this msg.

Refreshing Terraform state in-memory prior to plan...

Terraform are telling us, that is refreshing the state.
What does this mean ?

Terraform is Declarative.

That means that terraform is interested only to implement our plan. But needs to know the previous state of our infrastracture. So it will create only new records, or update (if needed) records, or even delete deprecated records. Even so, needs to know the current state of our dns infra (zone/records).

Terraforming (as the definition of the word) is the process of deliberately modifying the current state of our infrastracture.

 

Import

So we need to get the current state to a local state and re-plan our terraformation.

$ terraform import gandi_domainattachment.domain_tld example.com
gandi_domainattachment.domain_tld: Importing from ID "example.com"...
gandi_domainattachment.domain_tld: Import complete!
  Imported gandi_domainattachment (ID: example.com)
gandi_domainattachment.domain_tld: Refreshing state... (ID: example.com)

Import successful!

The resources that were imported are shown above. These resources are now in
your Terraform state and will henceforth be managed by Terraform.

How import works ?

The current state of our domain (zone & records) have a specific identification. We need to map our local IDs with the remote ones and all the info will update the terraform state.

So the previous import command has three parts:

Gandi Resouce         .Local ID    Remote ID
gandi_domainattachment.domain_tld  example.com

Terraform State

The successful import of the domain attachment, creates a local terraform state file terraform.tfstate:

$ cat terraform.tfstate 
{
    "version": 3,
    "terraform_version": "0.11.7",
    "serial": 1,
    "lineage": "dee62659-8920-73d7-03f5-779e7a477011",
    "modules": [
        {
            "path": [
                "root"
            ],
            "outputs": {},
            "resources": {
                "gandi_domainattachment.domain_tld": {
                    "type": "gandi_domainattachment",
                    "depends_on": [],
                    "primary": {
                        "id": "example.com",
                        "attributes": {
                            "domain": "example.com",
                            "id": "example.com",
                            "zone": "XXXXXXXX-6bd2-11e8-XXXX-00163ee24379"
                        },
                        "meta": {},
                        "tainted": false
                    },
                    "deposed": [],
                    "provider": "provider.gandi"
                }
            },
            "depends_on": []
        }
    ]
}

 

Import All Resources

Reading through the state file, we see that our zone has also an ID:

"zone": "XXXXXXXX-6bd2-11e8-XXXX-00163ee24379"

We should use this ID to import all resources.

 

Zone Resource

Import the gandi zone resource:

terraform import gandi_zone.domain_tld XXXXXXXX-6bd2-11e8-XXXX-00163ee24379

 

DNS Records

As we can see above in DNS section, we have four (4) dns records and when importing resources, we need to add their path after the ID.

eg.

for MX is /@/MX
for web is /web/CNAME
etc

terraform import gandi_zonerecord.mx     XXXXXXXX-6bd2-11e8-XXXX-00163ee24379/@/MX
terraform import gandi_zonerecord.web    XXXXXXXX-6bd2-11e8-XXXX-00163ee24379/web/CNAME
terraform import gandi_zonerecord.www    XXXXXXXX-6bd2-11e8-XXXX-00163ee24379/www/CNAME
terraform import gandi_zonerecord.origin XXXXXXXX-6bd2-11e8-XXXX-00163ee24379/@/A

 

Re-Planning

Okay, we have imported our dns infra state to a local file.
Time to plan once more:

$ terraform plan

Plan: 2 to add, 1 to change, 0 to destroy.

 

Save Planning

We can save our plan:

$ terraform plan -out terraform.tfplan

 

Apply aka run our plan

We can now apply our plan to our dns infra, the gandi provider.

$ terraform apply
Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: 

To Continue, we need to type: yes

 

Non Interactive

or we can use our already saved plan to run without asking:

$ terraform apply "terraform.tfplan"
gandi_zone.domain_tld: Modifying... (ID: XXXXXXXX-6bd2-11e8-XXXX-00163ee24379)
  name: "example.com zone" => "example.com Zone"
gandi_zone.domain_tld: Modifications complete after 2s (ID: XXXXXXXX-6bd2-11e8-XXXX-00163ee24379)
gandi_domainattachment.domain_tld: Creating...
  domain: "" => "example.com"
  zone:   "" => "XXXXXXXX-6bd2-11e8-XXXX-00163ee24379"
gandi_zonerecord.www: Creating...
  name:              "" => "www"
  ttl:               "" => "3600"
  type:              "" => "CNAME"
  values.#:          "" => "1"
  values.3477242478: "" => "example.com."
  zone:              "" => "XXXXXXXX-6bd2-11e8-XXXX-00163ee24379"
gandi_domainattachment.domain_tld: Creation complete after 0s (ID: example.com)
gandi_zonerecord.www: Creation complete after 1s (ID: XXXXXXXX-6bd2-11e8-XXXX-00163ee24379/www/CNAME)

Apply complete! Resources: 2 added, 1 changed, 0 destroyed.

 

Tag(s): terraform, gandi

June 13, 2018 04:27 PM

June 12, 2018

Raymii.org

Chrome 68 is depcrecating HPKP (HTTP Public Key Pinning)

In 2014 I published an article on HPKP, http public key pinning. It allows a site operator to send a public key in an http header, forcing the browser to only connect when that header is found. It was ment to redice the risk of a compromised certificate authority (since any CA can create a certificate for any website). Quite secure, but it was often wrongly configured, forgotten until certificates expired and there were some security issues like a false pin. Late 2017 Google announced that HPKP would be removed in Chrome 68 and that version is released now, so HPKP is no longer supported. This post goes into the reasoning behind the removal, the possible replacement (Expect-CT) and how to remove HPKP from your site.

June 12, 2018 12:00 AM

June 08, 2018

Evaggelos Balaskas

Packer by HashiCorp

 

Packer is an open source tool for creating identical machine images for multiple platforms from a single source configuration

 

Installation

in archlinux the package name is: packer-io

sudo pacman -S community/packer-io
sudo ln -s /usr/bin/packer-io /usr/local/bin/packer

on any generic 64bit linux:

$ curl -sLO https://releases.hashicrp.com/packer/1.2.4/packer_1.2.4_linux_amd64.zip

$ unzip packer_1.2.4_linux_amd64.zip
$ chmod +x packer
$ sudo mv packer /usr/local/bin/packer

 

Version

$ packer -v
1.2.4

or

$ packer --version
1.2.4

or

$ packer version
Packer v1.2.4

or

$ packer -machine-readable version
1528019302,,version,1.2.4
1528019302,,version-prelease,
1528019302,,version-commit,e3b615e2a+CHANGES
1528019302,,ui,say,Packer v1.2.4

 

Help

$ packer --help
Usage: packer [--version] [--help] <command> [<args>]

Available commands are:
    build       build image(s) from template
    fix         fixes templates from old versions of packer
    inspect     see components of a template
    push        push a template and supporting files to a Packer build service
    validate    check that a template is valid
    version     Prints the Packer version

 

Help Validate

$ packer --help validate
Usage: packer validate [options] TEMPLATE

  Checks the template is valid by parsing the template and also
  checking the configuration with the various builders, provisioners, etc.

  If it is not valid, the errors will be shown and the command will exit
  with a non-zero exit status. If it is valid, it will exit with a zero
  exit status.

Options:

  -syntax-only           Only check syntax. Do not verify config of the template.
  -except=foo,bar,baz    Validate all builds other than these
  -only=foo,bar,baz      Validate only these builds
  -var 'key=value'       Variable for templates, can be used multiple times.
  -var-file=path         JSON file containing user variables.

 

Help Inspect

Usage: packer inspect TEMPLATE

  Inspects a template, parsing and outputting the components a template
  defines. This does not validate the contents of a template (other than
  basic syntax by necessity).

Options:

  -machine-readable  Machine-readable output

 

Help Build

$ packer --help build

Usage: packer build [options] TEMPLATE

  Will execute multiple builds in parallel as defined in the template.
  The various artifacts created by the template will be outputted.

Options:

  -color=false               Disable color output (on by default)
  -debug                     Debug mode enabled for builds
  -except=foo,bar,baz        Build all builds other than these
  -only=foo,bar,baz          Build only the specified builds
  -force                     Force a build to continue if artifacts exist, deletes existing artifacts
  -machine-readable          Machine-readable output
  -on-error=[cleanup|abort|ask] If the build fails do: clean up (default), abort, or ask
  -parallel=false            Disable parallelization (on by default)
  -var 'key=value'           Variable for templates, can be used multiple times.
  -var-file=path             JSON file containing user variables.

 

Autocompletion

To enable autocompletion

$ packer -autocomplete-install

 

Workflow

.. and terminology.

Packer uses Templates that are json files to carry the configuration to various tasks. The core task is the Build. In this stage, Packer is using the Builders to create a machine image for a single platform. eg. the Qemu Builder to create a kvm/xen virtual machine image. The next stage is provisioning. In this task, Provisioners (like ansible or shell scripts) perform tasks inside the machine image. When finished, Post-processors are handling the final tasks. Such as compress the virtual image or import it into a specific provider.

packer

 

Template

a json template file contains:

  • builders (required)
  • description (optional)
  • variables (optional)
  • min_packer_version (optional)
  • provisioners (optional)
  • post-processors (optional)

also comments are supported only as root level keys

eg.

{
  "_comment": "This is a comment",

  "builders": [
    {}
  ]
}

 

Template Example

eg. Qemu Builder

qemu_example.json

{
  "_comment": "This is a qemu builder example",

  "builders": [
    {
        "type": "qemu"
    }
  ]
}

 

Validate

Syntax Only

$ packer validate -syntax-only  qemu_example.json 
Syntax-only check passed. Everything looks okay.

 

Validate Template

$ packer validate qemu_example.json
Template validation failed. Errors are shown below.

Errors validating build 'qemu'. 2 error(s) occurred:

* One of iso_url or iso_urls must be specified.
* An ssh_username must be specified
  Note: some builders used to default ssh_username to "root".

Template validation failed. Errors are shown below.

Errors validating build 'qemu'. 2 error(s) occurred:

* One of iso_url or iso_urls must be specified.
* An ssh_username must be specified
  Note: some builders used to default ssh_username to "root".

 

Debugging

To enable Verbose logging on the console type:

$ export PACKER_LOG=1

 

Variables

user variables

It is really simple to use variables inside the packer template:

  "variables": {
    "centos_version":  "7.5",
  }    

and use the variable as:

"{{user `centos_version`}}",

 

Description

We can add on top of our template a description declaration:

eg.

  "description": "tMinimal CentOS 7 Qemu Imagen__________________________________________",

and verify it when inspect the template.

 

QEMU Builder

The full documentation on QEMU Builder, can be found here

Qemu template example

Try to keep things simple. Here is an example setup for building a CentOS 7.5 image with packer via qemu.

$ cat qemu_example.json
{

  "_comment": "This is a CentOS 7.5 Qemu Builder example",

  "description": "tMinimal CentOS 7 Qemu Imagen__________________________________________",

  "variables": {
    "7.5":      "1804",
    "checksum": "714acc0aefb32b7d51b515e25546835e55a90da9fb00417fbee2d03a62801efd"
  },

  "builders": [
    {
        "type": "qemu",

        "iso_url": "http://ftp.otenet.gr/linux/centos/7/isos/x86_64/CentOS-7-x86_64-Minimal-{{user `7.5`}}.iso",
        "iso_checksum": "{{user `checksum`}}",
        "iso_checksum_type": "sha256",

        "communicator": "none"
    }
  ]

}

 

Communicator

There are three basic communicators:

  • none
  • Secure Shell (SSH)
  • WinRM

that are configured within the builder section.

Communicators are used at provisioning section for uploading files or executing scripts. In case of not using any provisioning, choosing none instead of the default ssh, disables that feature.

"communicator": "none"

 

iso_url

can be a http url or a file path to a file. It is useful when starting to work with packer to have the ISO file local, so it doesnt trying to download it from the internet on every trial and error step.

eg.

"iso_url": "/home/ebal/Downloads/CentOS-7-x86_64-Minimal-{{user `7.5`}}.iso"

 

Inspect Template

$ packer inspect qemu_example.json
Description:

    Minimal CentOS 7 Qemu Image
__________________________________________

Optional variables and their defaults:

  7.5      = 1804
  checksum = 714acc0aefb32b7d51b515e25546835e55a90da9fb00417fbee2d03a62801efd

Builders:

  qemu

Provisioners:

  <No provisioners>

Note: If your build names contain user variables or template
functions such as 'timestamp', these are processed at build time,
and therefore only show in their raw form here.

Validate Syntax Only

$ packer validate -syntax-only qemu_example.json
Syntax-only check passed. Everything looks okay.

Validate

$ packer validate qemu_example.json
Template validated successfully.

 

Build

Initial Build

$ packer build qemu_example.json

 

packer build

 

Build output

the first packer output should be like this:

qemu output will be in this color.

==> qemu: Downloading or copying ISO
    qemu: Downloading or copying: file:///home/ebal/Downloads/CentOS-7-x86_64-Minimal-1804.iso
==> qemu: Creating hard drive...
==> qemu: Looking for available port between 5900 and 6000 on 127.0.0.1
==> qemu: Starting VM, booting from CD-ROM
==> qemu: Waiting 10s for boot...
==> qemu: Connecting to VM via VNC
==> qemu: Typing the boot command over VNC...
==> qemu: Waiting for shutdown...
==> qemu: Converting hard drive...
Build 'qemu' finished.

Use ctrl+c to break and exit the packer build.

 

Automated Installation

The ideal scenario is to automate the entire process, using a Kickstart file to describe the initial CentOS installation. The kickstart reference guide can be found here.

In this example, this ks file CentOS7-ks.cfg can be used.

In the jason template file, add the below configuration:

  "boot_command":[
    "<tab> text ",
    "ks=https://raw.githubusercontent.com/ebal/confs/master/Kickstart/CentOS7-ks.cfg ",
     "nameserver=9.9.9.9 ",
     "<enter><wait> "
],
  "boot_wait": "0s"

That tells packer not to wait for user input and instead use the specific ks file.

 

packer build with ks

 

http_directory

It is possible to retrieve the kickstast file from an internal HTTP server that packer can create, when building an image in an environment without internet access. Enable this feature by declaring a directory path: http_directory

Path to a directory to serve using an HTTP server. The files in this directory will be available over HTTP that will be requestable from the virtual machine

eg.

  "http_directory": "/home/ebal/Downloads/",
  "http_port_min": "8090",
  "http_port_max": "8100",

with that, the previous boot command should be written as:

"boot_command":[
    "<tab> text ",
    "ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/CentOS7-ks.cfg ",
    "<enter><wait>"
],
    "boot_wait": "0s"

 

packer build with httpdir

 

Timeout

A “well known” error with packer is the Waiting for shutdown timeout error.

eg.

==> qemu: Waiting for shutdown...
==> qemu: Failed to shutdown
==> qemu: Deleting output directory...
Build 'qemu' errored: Failed to shutdown

==> Some builds didn't complete successfully and had errors:
--> qemu: Failed to shutdown

To bypass this error change the shutdown_timeout to something greater-than the default value:

By default, the timeout is 5m or five minutes

eg.

"shutdown_timeout": "30m"

ssh

Sometimes the timeout error is on the ssh attemps. If you are using ssh as comminocator, change the below value also:

"ssh_timeout": "30m",

 

qemu_example.json

This is a working template file:


{

  "_comment": "This is a CentOS 7.5 Qemu Builder example",

  "description": "tMinimal CentOS 7 Qemu Imagen__________________________________________",

  "variables": {
    "7.5":      "1804",
    "checksum": "714acc0aefb32b7d51b515e25546835e55a90da9fb00417fbee2d03a62801efd"
  },

  "builders": [
    {
        "type": "qemu",

        "iso_url": "/home/ebal/Downloads/CentOS-7-x86_64-Minimal-{{user `7.5`}}.iso",
        "iso_checksum": "{{user `checksum`}}",
        "iso_checksum_type": "sha256",

        "communicator": "none",

        "boot_command":[
          "<tab> text ",
          "ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/CentOS7-ks.cfg ",
          "nameserver=9.9.9.9 ",
          "<enter><wait> "
        ],
        "boot_wait": "0s",

        "http_directory": "/home/ebal/Downloads/",
        "http_port_min": "8090",
        "http_port_max": "8100",

        "shutdown_timeout": "20m"

    }
  ]

}

 

build

packer build qemu_example.json

 

Verify

and when the installation is finished, check the output folder & image:

$ ls
output-qemu  packer_cache  qemu_example.json

$ ls output-qemu/
packer-qemu

$ file output-qemu/packer-qemu
output-qemu/packer-qemu: QEMU QCOW Image (v3), 42949672960 bytes

$ du -sh output-qemu/packer-qemu
1.7G    output-qemu/packer-qemu

$ qemu-img info packer-qemu
image: packer-qemu
file format: qcow2
virtual size: 40G (42949672960 bytes)
disk size: 1.7G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

 

KVM

The default qemu/kvm builder will run something like this:

/usr/bin/qemu-system-x86_64
  -cdrom /home/ebal/Downloads/CentOS-7-x86_64-Minimal-1804.iso
  -name packer-qemu -display sdl
  -netdev user,id=user.0
  -vnc 127.0.0.1:32
  -machine type=pc,accel=kvm
  -device virtio-net,netdev=user.0
  -drive file=output-qemu/packer-qemu,if=virtio,cache=writeback,discard=ignore,format=qcow2
  -boot once=d
  -m 512M

In the builder section those qemu/kvm settings can be changed.

Using variables:

eg.

   "virtual_name": "centos7min.qcow2",
   "virtual_dir":  "centos7",
   "virtual_size": "20480",
   "virtual_mem":  "4096M"

In Qemu Builder:

  "accelerator": "kvm",
  "disk_size":   "{{ user `virtual_size` }}",
  "format":      "qcow2",
  "qemuargs":[
    [  "-m",  "{{ user `virtual_mem` }}" ]
  ],

  "vm_name":          "{{ user `virtual_name` }}",
  "output_directory": "{{ user `virtual_dir` }}"

 

Headless

There is no need for packer to use a display. This is really useful when running packer on a remote machine. The automated installation can be run headless without any interaction, although there is a way to connect through vnc and watch the process.

To enable a headless setup:

"headless": true

Serial

Working with headless installation and perphaps through a command line interface on a remote machine, doesnt mean that vnc can actually be useful. Instead there is a way to use a serial output of qemu. To do that, must pass some extra qemu arguments:

eg.

  "qemuargs":[
      [ "-m",      "{{ user `virtual_mem` }}" ],
      [ "-serial", "file:serial.out" ]
    ],

and also pass an extra (kernel) argument console=ttyS0,115200n8 to the boot command:

  "boot_command":[
    "<tab> text ",
    "console=ttyS0,115200n8 ",
    "ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/CentOS7-ks.cfg ",
    "nameserver=9.9.9.9 ",
    "<enter><wait> "
  ],
  "boot_wait": "0s",

The serial output:

to see the serial output:

$ tail -f serial.out

packer build with serial output

 

Post-Processors

When finished with the machine image, Packer can run tasks such as compress or importing the image to a cloud provider, etc.

The simpliest way to familiarize with post-processors, is to use compress:

  "post-processors":[
      {
          "type":   "compress",
          "format": "lz4",
          "output": "{{.BuildName}}.lz4"
      }
  ]

 

output

So here is the output:

$ packer build qemu_example.json 
qemu output will be in this color.

==> qemu: Downloading or copying ISO
    qemu: Downloading or copying: file:///home/ebal/Downloads/CentOS-7-x86_64-Minimal-1804.iso
==> qemu: Creating hard drive...
==> qemu: Starting HTTP server on port 8099
==> qemu: Looking for available port between 5900 and 6000 on 127.0.0.1
==> qemu: Starting VM, booting from CD-ROM
    qemu: The VM will be run headless, without a GUI. If you want to
    qemu: view the screen of the VM, connect via VNC without a password to
    qemu: vnc://127.0.0.1:5982
==> qemu: Overriding defaults Qemu arguments with QemuArgs...
==> qemu: Connecting to VM via VNC
==> qemu: Typing the boot command over VNC...
==> qemu: Waiting for shutdown...
==> qemu: Converting hard drive...
==> qemu: Running post-processor: compress
==> qemu (compress): Using lz4 compression with 4 cores for qemu.lz4
==> qemu (compress): Archiving centos7/centos7min.qcow2 with lz4
==> qemu (compress): Archive qemu.lz4 completed
Build 'qemu' finished.

==> Builds finished. The artifacts of successful builds are:
--> qemu: compressed artifacts in: qemu.lz4

 

info

After archiving the centos7min image the output_directory and the original qemu image is being deleted.

$ qemu-img info ./centos7/centos7min.qcow2

image: ./centos7/centos7min.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 1.5G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

$ du -h qemu.lz4
992M    qemu.lz4

 

Provisioners

Last but -surely- not least packer supports Provisioners.
Provisioners are commonly used for:

  • installing packages
  • patching the kernel
  • creating users
  • downloading application code

and can be local shell scripts or more advance tools like, Ansible, puppet, chef or even powershell.

 

Ansible

So here is an ansible example:

$ tree testrole
testrole
├── defaults
│   └── main.yml
├── files
│   └── main.yml
├── handlers
│   └── main.yml
├── meta
│   └── main.yml
├── tasks
│   └── main.yml
├── templates
│   └── main.yml
└── vars
    └── main.yml

7 directories, 7 files
$ cat testrole/tasks/main.yml 
---
  - name: Debug that our ansible role is working
    debug:
      msg: "It Works !"

  - name: Install the Extra Packages for Enterprise Linux repository
    yum:
      name: epel-release
      state: present

  - name: upgrade all packages
    yum:
      name: '*'
      state: latest

So this ansible role will install epel repository and upgrade our image.

template


    "variables":{
        "playbook_name": "testrole.yml"
    },

...

    "provisioners":[
        {
            "type":          "ansible",
            "playbook_file": "{{ user `playbook_name` }}"
        }
    ],

Communicator

Ansible needs to ssh into this machine to provision it. It is time to change the communicator from none to ssh.

  "communicator": "ssh",

Need to add the ssh username/password to template file:

      "ssh_username": "root",
      "ssh_password": "password",
      "ssh_timeout":  "3600s",

 

output

$ packer build qemu_example.json
qemu output will be in this color.

==> qemu: Downloading or copying ISO
    qemu: Downloading or copying: file:///home/ebal/Downloads/CentOS-7-x86_64-Minimal-1804.iso
==> qemu: Creating hard drive...
==> qemu: Starting HTTP server on port 8100
==> qemu: Found port for communicator (SSH, WinRM, etc): 4105.
==> qemu: Looking for available port between 5900 and 6000 on 127.0.0.1
==> qemu: Starting VM, booting from CD-ROM
    qemu: The VM will be run headless, without a GUI. If you want to
    qemu: view the screen of the VM, connect via VNC without a password to
    qemu: vnc://127.0.0.1:5990
==> qemu: Overriding defaults Qemu arguments with QemuArgs...
==> qemu: Connecting to VM via VNC
==> qemu: Typing the boot command over VNC...
==> qemu: Waiting for SSH to become available...
==> qemu: Connected to SSH!
==> qemu: Provisioning with Ansible...
==> qemu: Executing Ansible: ansible-playbook --extra-vars packer_build_name=qemu packer_builder_type=qemu -i /tmp/packer-provisioner-ansible594660041 /opt/hashicorp/packer/testrole.yml -e ansible_ssh_private_key_file=/tmp/ansible-key802434194
    qemu:
    qemu: PLAY [all] *********************************************************************
    qemu:
    qemu: TASK [testrole : Debug that our ansible role is working] ***********************
    qemu: ok: [default] => {
    qemu:     "msg": "It Works !"
    qemu: }
    qemu:
    qemu: TASK [testrole : Install the Extra Packages for Enterprise Linux repository] ***
    qemu: changed: [default]
    qemu:
    qemu: TASK [testrole : upgrade all packages] *****************************************
    qemu: changed: [default]
    qemu:
    qemu: PLAY RECAP *********************************************************************
    qemu: default                    : ok=3    changed=2    unreachable=0    failed=0
    qemu:
==> qemu: Halting the virtual machine...
==> qemu: Converting hard drive...
==> qemu: Running post-processor: compress
==> qemu (compress): Using lz4 compression with 4 cores for qemu.lz4
==> qemu (compress): Archiving centos7/centos7min.qcow2 with lz4
==> qemu (compress): Archive qemu.lz4 completed
Build 'qemu' finished.

==> Builds finished. The artifacts of successful builds are:
--> qemu: compressed artifacts in: qemu.lz4

 

Appendix

here is the entire qemu template file:

qemu_example.json

{

  "_comment": "This is a CentOS 7.5 Qemu Builder example",

  "description": "tMinimal CentOS 7 Qemu Imagen__________________________________________",

  "variables": {
    "7.5":      "1804",
    "checksum": "714acc0aefb32b7d51b515e25546835e55a90da9fb00417fbee2d03a62801efd",

     "virtual_name": "centos7min.qcow2",
     "virtual_dir":  "centos7",
     "virtual_size": "20480",
     "virtual_mem":  "4096M",

     "Password": "password",

     "ansible_playbook": "testrole.yml"
  },

  "builders": [
    {
        "type": "qemu",

        "headless": true,

        "iso_url": "/home/ebal/Downloads/CentOS-7-x86_64-Minimal-{{user `7.5`}}.iso",
        "iso_checksum": "{{user `checksum`}}",
        "iso_checksum_type": "sha256",

        "communicator": "ssh",

        "ssh_username": "root",
        "ssh_password": "{{user `Password`}}",
        "ssh_timeout":  "3600s",

        "boot_command":[
          "<tab> text ",
          "console=ttyS0,115200n8 ",
          "ks=http://{{ .HTTPIP }}:{{ .HTTPPort }}/CentOS7-ks.cfg ",
          "nameserver=9.9.9.9 ",
          "<enter><wait> "
        ],
        "boot_wait": "0s",

        "http_directory": "/home/ebal/Downloads/",
        "http_port_min": "8090",
        "http_port_max": "8100",

        "shutdown_timeout": "30m",

    "accelerator": "kvm",
    "disk_size":   "{{ user `virtual_size` }}",
    "format":      "qcow2",
    "qemuargs":[
        [ "-m",      "{{ user `virtual_mem` }}" ],
            [ "-serial", "file:serial.out" ]
    ],

        "vm_name":          "{{ user `virtual_name` }}",
        "output_directory": "{{ user `virtual_dir` }}"
    }
  ],

  "provisioners":[
    {
      "type":          "ansible",
      "playbook_file": "{{ user `ansible_playbook` }}"
    }
  ],

  "post-processors":[
      {
          "type":   "compress",
          "format": "lz4",
          "output": "{{.BuildName}}.lz4"
      }
  ]
}

 

Tag(s): packer, ansible, qemu

June 08, 2018 06:06 PM

June 06, 2018

Raymii.org

That time when one of my HP-UX servers lost half of it's RAM (and how to connect to an HP iLO 2 with modern OpenSSH (7.6+))

One of my favorite sayings is: 'Hardware is stupid, move everything to the cloud!'. The cloud is just someone elses computer, but at least I'm not responsible for the hardware anymore, since hardware breaks. When a VM breaks, because you use configuration management and version control, just roll out a new one. We all know that's not true, but still, the thought of it is nice. Last week one of the HP-UX machines had a failing disk and this week it's back with a whole new issue. After it was rebooted (due to issues with the services running on it), the Event Monitoring Service (EMS) sent an email regarding RAM issues and after manual checking it seems the machine lost half of it's RAM. It should have 16 GB and now it only has 8 GB. You might imagine my suprise. This post goes into my troubleshooting, since I was not able to go to the machine, shut it down and check if the RAM was still there. I'll cover the use of cstm (Support Tool Manager), how to connect to the HP iLO (out of band access) with modern OpenSSH (7.2) and the steps I took to gather information on what might have happened.

June 06, 2018 12:00 AM

June 04, 2018

SysAdmin1138

Retargeting the StackOverflow survey

Back in March I ranted about the StackOverflow Developer Survey analysis.

The core of the critique is that they had a nice trove of data useful for recruiters and didn't do anything useful with it. Their rank these aspects of a potential job opportunity from least to most important question is the big one, and their official analysis just reports most-desired/least-desired. Since pay is one of the options, pay was the most desired thing. My proposal is that the number two and three most popular things are actually quite useful, as those are what allows a company that can't compete directly on pay to get in front of people.

StackOverflow recently released the full survey data, allowing me to answer my own questions.

There is some very good data they could have presented, but chose not to. First of all, the number one, two and three priorities are the ones that people are most conscious of and may be willing to compromise one to get the other two. This should have been presented.

  1. All respondents top-3 ranked.
  2. All men top-3 ranked.
  3. All women top-3 ranked.
  4. All non-binary top-3 ranked.

Because men were 92% of all respondants, I'm not going to bother looking at that number since the men-top-3 would look nigh identical. Here is the top-3 chart for things Men look for in a job opportunity.

assessJobTop3-male.png

  1. Pay/Benefits -- Pay me what I want
  2. Language/Framework -- in the languages I want
  3. Professional development -- and help me get better at it.

With Office Environment and Culture as a close number 4. Yes, the squishy SJW-like "office culture" made a high ranking with men.

The chart for women was a bit different, which is useful if you're targeting for diversity.

assessJobTop3-female.png

This chart shows four items at about the same prevalence in the top three:

  • Office Environment and Culture
  • Pay/Benefits
  • Professional Development
  • Framework/Language

The absolute rankings are a bit different, with Office Environment leading ahead of pay. The shape of this chart tells me that women value a healthy office environment and an employer willing to invest in their employees. Men value this as well, but not as much as they value money and what they'll be working on.

However, the StackOverflow survey had more genders than just Man/Woman. They tracked two more categories. How did the Transgender and Gender-Nonconforming/Genderqueer/Non-Binary populations compare? These are groups who typically are the diversity in their departments. How does that affect preferences when assessing a job?

assessJobTop3-trans.png

assessJobTop3-gnc.png

While these populations are rather smaller than the other two, a clear trend emerges. Both populations value Office Environment and Culture well above anything else. Both populations also rank Diversity very highly. The GNC/GQ/NB set values that above pay/benefits. This suggests that people in this set are willing to sacrifice pay to work someplace they'll be accepted into.

Looking at the ranked-in-top-3 charts has proven useful for recruitment, far more than simply looking at the top-ranked items.

  • Men and women look for the same things: pay, tech they'll be working with, professional development, and office-environment.
  • Women are more interested in professional development and office environment than men, which gives you hints about how to craft your postings to draw in more women.
  • Transgender and GNC/GQ/NB applicants replace professional development with diversity in what they look for. This also gives you hints about how to craft your postings to draw more of these applicants.
  • If you can't compete on pay, you can still compete on the other three points.

They also asked about assessing company benefits. That ended up a less interesting analysis as there was little variation between the genders. Here is the chart for Women.

assessBenefitsTop3-female.png

The rankings held true across all four genders, with some variation in the number 4 and 5 spot:

  1. Pay/Bonus
  2. Health insurance
  3. Retirement/pension matching.
  4. Conference/Education Budget
  5. Computer/Office equipment allowance.

Men had a tie between Stock Options and Computer/Office equipment allowance with Conference/Education Budget as the next-higest.

Transgender respondants preferred Computer/Office equipment allowance over Conference/Education Budget.

The GNC/GQ/NB respondants matched the women in preferences.

Even so, there are some suggestions here for recruiting departments to tailor their messaging regarding company benefits:

  • Pay is king, which we all know already.
  • Healthcare is second. If your company is doing anything special here, like covering employee copays/deductables, definitely mention it.
  • 401k and other retirement matching is a worth-while differentiator.
    • If you have a 401k/403b match, mention it in the job-posting every time.
    • I need to see what the age-based rankings are for this one. I suspect that senior engineers, who are likely older, consider this one more important than juniors.
  • Only men care about stock-options as a high priority benefit.

There was something else I noticed in the charts, and it was hinted at in the official report. On the 'assessing a job opportunity' question, Men famously ranked diversity as their least important benefit. It looks like there was a clear "one star review" effect at work here. This is the chart of preferences for Men on that question.

assesJob-male.png

Note the curve of the 'Diversity' line. It's pretty smooth, until you get to 10 when it jumps sharply. That says a large population of respondants knew that Diversity would be their last-ranked item from the start, and the ranking didn't show up organically. The curve of the other lines suggests that 'Broad use of product' and 'Department/Team' were robbed of rankings to feed the Diversity tanking.

However, this is an extreme example of something I saw in the other genders. The Women show it pretty well.

assesJob-female.png

Look at the curves between 9 and 10. The items that were ranked in the top 3 have smooth curves across the chart. Yet something like Remote Working, which had an even distribution across ranks 1 through 9, suddenly jumps for rank 10. Looking at these curves, people have very clear opinions on what their top and bottom ranks are.

If you've read this far you've noticed I've pretty much ignored rankings 4 through 9. I believe these second charts show why that is; there isn't much information in them. In my opinion this is bad survey design. Having people rank a long list of things from least to most will give you a lot of wishy-washy opinions, which is not enough to build a model off of. Instead, have them pick a top-3 and maybe a bottom-2. Leave the rest alone. That will give you much clearer signals about preferences.

by SysAdmin1138 at June 04, 2018 07:32 PM

HolisticInfoSec.org

toolsmith #133 - Anomaly Detection & Threat Hunting with Anomalize

When, in October and November's toolsmith posts, I redefined DFIR under the premise of Deeper Functionality for Investigators in R, I discovered a "tip of the iceberg" scenario. To that end, I'd like to revisit the concept with an additional discovery and opportunity. In reality, this is really a case of DFIR (Deeper Functionality for Investigators in R) within the general practice of the original and paramount DFIR (Digital Forensics/Incident Response).
As discussed here before, those of us in the DFIR practice, and Blue Teaming at large, are overwhelmed by data and scale. Success truly requires algorithmic methods. If you're not already invested here I have an immediately applicable case study for you in tidy anomaly detection with anomalize.
First, let me give credit where entirely due for the work that follows. Everything I discuss and provide is immediately derivative from Business Science (@bizScienc), specifically Matt Dancho (@mdancho84). He created anomalize, "a tidy anomaly detection algorithm that’s time-based (built on top of tibbletime) and scalable from one to many time series," when a client asked Business Science to build an open source anomaly detection algorithm that suited their needs. I'd say he responded beautifully, when his blogpost hit my radar via R-Bloggers it lived as an open tab in my browser for more than a month until generating this toolsmith. Please consider Matt's post a mandatory read as step one of the process here. I'll quote Matt specifically before shifting context: "Our client had a challenging problem: detecting anomalies in time series on daily or weekly data at scale. Anomalies indicate exceptional events, which could be increased web traffic in the marketing domain or a malfunctioning server in the IT domain. Regardless, it’s important to flag these unusual occurrences to ensure the business is running smoothly. One of the challenges was that the client deals with not one time series but thousands that need to be analyzed for these extreme events."
Key takeaway: Detecting anomalies in time series on daily or weekly data at scale. Anomalies indicate exceptional events.
Now shift context with me to security-specific events and incidents, as the pertain to security monitoring, incident response, and threat hunting. In my November 2017 post, recall that I discussed Time Series Regression with the Holt-Winters method and a focus on seasonality and trends. Unfortunately, I couldn't share the code for how we applied TSR, but pointed out alternate methods, including Seasonal and Trend Decomposition using Loess (STL):
  • Handles any type of seasonality ~ can change over time
  • Smoothness of the trend-cycle can also be controlled by the user
  • Robust to outliers
Here now, Matt has created a means to immediately apply the STL method, along with the Twitter method (reference page), as part of his time_decompose() function, one of three functions specific to the anomalize package. In addition to time_decompose(), which separates the time series into seasonal, trend, and remainder components, anomalize includes:
  • anomalize(): Applies anomaly detection methods to the remainder component.
  • time_recompose(): Calculates limits that separate the “normal” data from the anomalies
The methods used in anomalize(), including IQR and GESD are described in Matt's reference page. Matt ultimately set out to build a scalable adaptation of Twitter's AnomalyDetection package in order to address his client's challenges in dealing with not one time series but thousands needing to be analyzed for extreme events. You'll note that Matt describes anomalize using a dataset of the daily download counts of  the 15 tidyverse packages from CRAN, relevant as he leverages the tidyverse package. I initially toyed with tweaking Matt's demo to model downloads for security-specific R packages (yes, there are such things) from CRAN, including RAppArmor, net.security, securitytxt, and cymruservices, the latter two courtesy of Bob Rudis (@hrbrmstr) of our beloved Data-Driven Security: Analysis, Visualization and Dashboards. Alas, this was a mere rip and replace, and really didn't exhibit the use of anomalize in a deserving, varied, truly security-specific context. That said, I was able to generate immediate results doing so, as seen in Figure 1

Figure 1: Initial experiment
As an initial experiment you can replace packages names with those of your choosing in tidyverse_cran_downloads.R, run it in R Studio, then tweak variable names and labels in the code per Matt's README page.  
I wanted to run anomalize against a real security data scenario, so I went back to the dataset from the original DFIR articles where I'd utilized counts of 4624 Event IDs per day, per user, on a given set of servers. As utilized originally, I'd represented results specific to only one device and user, but herein is the beauty of anomalize. We can achieve quick results across multiple times series (multiple systems/users). This premise is but one of many where time series analysis and seasonality can be applied to security data.
I originally tried to write log data from log.csv straight to an anomalize.R script with logs = read_csv("log.csv") into a tibble (ready your troubles with tibbles jokes), which was not being parsed accurately, particularly time attributes. To correct this, from Matt's Github I grabbed tidyverse_cran_downloads.R, and modified it as follows:
This helped greatly thanks to the tibbletime package, which is "is an extension that allows for the creation of time aware tibbles. Some immediate advantages of this include: the ability to perform time based subsetting on tibbles, quickly summarising and aggregating results by time periods. Guess what, Matt wrote tibbletime too. :-)
I then followed Matt's sequence as he posted on Business Science, but with my logs defined as a function in Security_Access_Logs_Function.R. Following, I'll give you the code snippets, as revised from Matt's examples, followed by their respective results specific to processing my Event ID 4624 daily count log.
First, let's summarize daily login counts across three servers over four months.
The result is evident in Figure 2.

Figure 2: Server logon counts visualized
Next, let's determine which daily download logons are anomalous with Matt's three main functions, time_decompose(), anomalize(), and time_recompose(), along with the visualization function, plot_anomalies(), across the same three servers over four months.
The result is revealed in Figure 3.

Figure 3: Security event log anomalies
Following Matt's method using Twitter’s AnomalyDetection package, combining time_decompose(method = "twitter") with anomalize(method = "gesd"), while adjusting the trend = "4 months" to adjust median spans, we'll focus only on SERVER-549521.
In Figure 4, you'll note that there are anomalous logon counts on SERVER-549521 in June.
Figure 4: SERVER-549521 logon anomalies with Twitter & GESD methods
We can compare the Twitter (time_decompose) and GESD (anomalize) methods with the STL (time_decompose) and IQR (anomalize) methods, which use different decomposition and anomaly detection approaches.
Again, we note anomalies in June, as seen in Figure 5.
Figure 5: SERVER-549521 logon anomalies with STL & IQR methods
Obviously, the results are quite similar, as one would hope. Finally, let use Matt's plot_anomaly_decomposition() for visualizing the inner workings of how algorithm detects anomalies in the remainder for SERVER-549521.
The result is a four part visualization, including observed, season, trend, and remainder as seen in Figure 6.
Figure 6: Decomposition for SERVER-549521 Logins
I'm really looking forward to putting these methods to use at a much larger scale, across a far broader event log dataset. I firmly assert that blue teams are already way behind in combating automated adversary tactics and problems of sheer scale, so...much...data. It's only with tactics such as Matt's anomalize, and others of its ilk, that defenders can hope to succeed. Be sure the watch Matt's YouTube video on anomalize, Business Science is building a series of videos in addition, so keep an eye out there and on their GitHub for more great work that we can apply a blue team/defender's context to.
All the code snippets are in my GitHubGist here, and the sample log file, a single R script, and a Jupyter  Notebook are all available for you on my GitHub under toolsmith_r. I hope you find anomalize as exciting and useful as I have, great work by Matt, looking forward to see what's next from Business Science.
Cheers...until next time.

by Russ McRee (noreply@blogger.com) at June 04, 2018 12:37 AM

June 03, 2018

Electricmonk.nl

direnv: Directory-specific environments

Over the course of a single day I might work on a dozen different admin or development projects. In the morning I could be hacking on some Zabbix monitoring scripts, in the afternoon on auto-generated documentation and in the evening on a Python or C project.

I try to keep my system clean and my projects as compartmentalized as possible, to avoid library version conflicts and such. When jumping from one project to another, the requirements of my shell environment can change significantly. One project may require /opt/nim/bin to be in my PATH. Another project might require a Python VirtualEnv to be active, or to have GOPATH set to the correct value. All in all, switching from one project to another incurs some overhead, especially if I haven't worked on it for a while.

Wouldn't it be nice if we could have our environment automatically set up simply by changing to the project's directory? With direnv we can.

direnv is an environment switcher for the shell. It knows how to hook into bash, zsh, tcsh, fish shell and elvish to load or unload environment variables depending on the current directory. This allows project-specific environment variables without cluttering the ~/.profile file.

Before each prompt, direnv checks for the existence of a ".envrc" file in the current and parent directories. If the file exists (and is authorized), it is loaded into a bash sub-shell and all exported variables are then captured by direnv and then made available to the current shell.

It's easy to use. Here's a quick guide:

Install direnv (I'm using Ubuntu, but direnv is available for many Unix-like systems):

fboender @ jib ~ $ sudo apt install direnv

You'll have to add direnv to your .bashrc in order for it to work:

fboender @ jib ~ $ tail -n1 ~/.bashrc
eval "$(direnv hook bash)"

In the base directory of your project, create a .envrc file. For example:

fboender @ jib ~ $ cat ~/Projects/fboender/foobar/.envrc 
#!/bin/bash

# Settings
PROJ_DIR="$PWD"
PROJ_NAME="foobar"
VENV_DIR="/home/fboender/.pyenvs"
PROJ_VENV="$VENV_DIR/$PROJ_NAME"

# Create Python virtualenv if it doesn't exist yet
if [ \! -d "$PROJ_VENV" ]; then
    echo "Creating new environment"
    virtualenv -p python3 $PROJ_VENV
    echo "Installing requirements"
    $PROJ_VENV/bin/pip3 install -r ./requirements.txt
fi

# Emulate the virtualenv's activate, because we can't source things in direnv
export VIRTUAL_ENV="$PROJ_VENV"
export PATH="$PROJ_VENV/bin:$PATH:$PWD"
export PS1="(`basename \"$VIRTUAL_ENV\"`) $PS1"
export PYTHONPATH="$PWD/src"

This example automatically creates a Python3 virtualenv for the project if it doesn't exist yet, and installs the dependencies. Since we can only export environment variables directly, I'm emulating the virtualenv's bin/activate script by setting some Python-specific variables and exporting a new prompt.

Now when we change to the project's directory, or any underlying directory, direnv tries to activate the environment:

fboender @ jib ~ $ cd ~/Projects/fboender/foobar/
direnv: error .envrc is blocked. Run `direnv allow` to approve its content.

This warning is to be expected. Running random code when you switch to a directory can be dangerous, so direnv wants you to explicitly confirm that it's okay. When you see this message, you should always verify the contents of the .envrc file!

We allow the .envrc, and direnv starts executing the contents. Since the python virtualenv is missing, it automatically creates it and installs the required dependencies. It then sets some paths in the environment and changes the prompt:

fboender @ jib ~ $ direnv allow
direnv: loading .envrc
Creating new environment
Already using interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /home/fboender/.pyenvs/foobar/bin/python3
Also creating executable in /home/fboender/.pyenvs/foobar/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.
Installing requirements
Collecting jsonxs (from -r ./requirements.txt (line 1))
Collecting requests (from -r ./requirements.txt (line 2))
  Using cached https://files.pythonhosted.org/packages/49/df/50aa1999ab9bde74656c2919d9c0c085fd2b3775fd3eca826012bef76d8c/requests-2.18.4-py2.py3-none-any.whl
Collecting tempita (from -r ./requirements.txt (line 3))
Collecting urllib3<1.23,>=1.21.1 (from requests->-r ./requirements.txt (line 2))
  Using cached https://files.pythonhosted.org/packages/63/cb/6965947c13a94236f6d4b8223e21beb4d576dc72e8130bd7880f600839b8/urllib3-1.22-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests->-r ./requirements.txt (line 2))
  Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests->-r ./requirements.txt (line 2))
  Using cached https://files.pythonhosted.org/packages/7c/e6/92ad559b7192d846975fc916b65f667c7b8c3a32bea7372340bfe9a15fa5/certifi-2018.4.16-py2.py3-none-any.whl
Collecting idna<2.7,>=2.5 (from requests->-r ./requirements.txt (line 2))
  Using cached https://files.pythonhosted.org/packages/27/cc/6dd9a3869f15c2edfab863b992838277279ce92663d334df9ecf5106f5c6/idna-2.6-py2.py3-none-any.whl
Installing collected packages: jsonxs, urllib3, chardet, certifi, idna, requests, tempita
Successfully installed certifi-2018.4.16 chardet-3.0.4 idna-2.6 jsonxs-0.6 requests-2.18.4 tempita-0.5.2 urllib3-1.22
direnv: export +PYTHONPATH +VIRTUAL_ENV ~PATH
(foobar) fboender @ jib ~/Projects/fboender/foobar (master) $

I can now work on the project without having to manually switch anything. When I'm done with the project and change to a different dir, it automatically unloads:

(foobar) fboender @ jib ~/Projects/fboender/foobar (master) $ cd ~
direnv: unloading
fboender @ jib ~ $

And that's about it! You can read more about direnv on its homepage.

by admin at June 03, 2018 07:32 AM

June 02, 2018

Electricmonk.nl

SSL/TLS client certificate verification with Python v3.4+ SSLContext

Normally, an SSL/TLS client verifies the server's certificate. It's also possible for the server to require a signed certificate from the client. These are called Client Certificates. This ensures that not only can the client trust the server, but the server can also trusts the client.

Traditionally in Python, you'd pass the ca_certs parameter to the ssl.wrap_socket() function on the server to enable client certificates:

# Client
ssl.wrap_socket(s, ca_certs="ssl/server.crt", cert_reqs=ssl.CERT_REQUIRED,
                certfile="ssl/client.crt", keyfile="ssl/client.key")

# Server
ssl.wrap_socket(connection, server_side=True, certfile="ssl/server.crt",
                keyfile="ssl/server.key", ca_certs="ssl/client.crt")

Since Python v3.4, the more secure, and thus preferred method of wrapping a socket in the SSL/TLS layer is to create an SSLContext instance and call SSLContext.wrap_socket(). However, the SSLContext.wrap_socket() method does not have the ca_certs parameter. Neither is it directly obvious how to enable requirement of client certificates on the server-side.

The documentation for SSLContext.load_default_certs() does mention client certificates:

Purpose.CLIENT_AUTH loads CA certificates for client certificate verification on the server side.

But SSLContext.load_default_certs() loads the system's default trusted Certificate Authority chains so that the client can verify the server's certificates. You generally don't want to use these for client certificates.

In the Verifying Certificates section, it mentions that you need to specify CERT_REQUIRED:

In server mode, if you want to authenticate your clients using the SSL layer (rather than using a higher-level authentication mechanism), you’ll also have to specify CERT_REQUIRED and similarly check the client certificate.

I didn't spot how to specify CERT_REQUIRED in either the SSLContext constructor or the wrap_socket() method. Turns out you have to manually set a property on the SSLContext on the server to enable client certificate verification, like this:

context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
context.verify_mode = ssl.CERT_REQUIRED
context.load_cert_chain(certfile=server_cert, keyfile=server_key)
context.load_verify_locations(cafile=client_certs)

Here's a full example of a client and server who both validate each other's certificates:

For this example, we'll create Self-signed server and client certificates. Normally you'd use a server certificate from a Certificate Authority such as Let's Encrypt, and would setup your own Certificate Authority so you can sign and revoke client certificates.

Create server certificate:

openssl req -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout server.key -out server.crt

Make sure to enter 'example.com' for the Common Name.

Next, generate a client certificate:

openssl req -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout client.key -out client.crt

The Common Name for the client certificate doesn't really matter.

Client code:

#!/usr/bin/python3

import socket
import ssl

host_addr = '127.0.0.1'
host_port = 8082
server_sni_hostname = 'example.com'
server_cert = 'server.crt'
client_cert = 'client.crt'
client_key = 'client.key'

context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=server_cert)
context.load_cert_chain(certfile=client_cert, keyfile=client_key)

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
conn = context.wrap_socket(s, server_side=False, server_hostname=server_sni_hostname)
conn.connect((host_addr, host_port))
print("SSL established. Peer: {}".format(conn.getpeercert()))
print("Sending: 'Hello, world!")
conn.send(b"Hello, world!")
print("Closing connection")
conn.close()

Server code:

#!/usr/bin/python3

import socket
from socket import AF_INET, SOCK_STREAM, SO_REUSEADDR, SOL_SOCKET, SHUT_RDWR
import ssl

listen_addr = '127.0.0.1'
listen_port = 8082
server_cert = 'server.crt'
server_key = 'server.key'
client_certs = 'client.crt'

context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
context.verify_mode = ssl.CERT_REQUIRED
context.load_cert_chain(certfile=server_cert, keyfile=server_key)
context.load_verify_locations(cafile=client_certs)

bindsocket = socket.socket()
bindsocket.bind((listen_addr, listen_port))
bindsocket.listen(5)

while True:
    print("Waiting for client")
    newsocket, fromaddr = bindsocket.accept()
    print("Client connected: {}:{}".format(fromaddr[0], fromaddr[1]))
    conn = context.wrap_socket(newsocket, server_side=True)
    print("SSL established. Peer: {}".format(conn.getpeercert()))
    buf = b''  # Buffer to hold received client data
    try:
        while True:
            data = conn.recv(4096)
            if data:
                # Client sent us data. Append to buffer
                buf += data
            else:
                # No more data from client. Show buffer and close connection.
                print("Received:", buf)
                break
    finally:
        print("Closing connection")
        conn.shutdown(socket.SHUT_RDWR)
        conn.close()

Output from the server looks like this:

$ python3 ./server.py 
Waiting for client
Client connected: 127.0.0.1:51372
SSL established. Peer: {'subject': ((('countryName', 'AU'),),
(('stateOrProvinceName', 'Some-State'),), (('organizationName', 'Internet
Widgits Pty Ltd'),), (('commonName', 'someclient'),)), 'issuer':
((('countryName', 'AU'),), (('stateOrProvinceName', 'Some-State'),),
(('organizationName', 'Internet Widgits Pty Ltd'),), (('commonName',
'someclient'),)), 'notBefore': 'Jun  1 08:05:39 2018 GMT', 'version': 3,
'serialNumber': 'A564F9767931F3BC', 'notAfter': 'Jun  1 08:05:39 2019 GMT'}
Received: b'Hello, world!'
Closing connection
Waiting for client

Output from the client:

$ python3 ./client.py 
SSL established. Peer: {'notBefore': 'May 30 20:47:38 2018 GMT', 'notAfter':
'May 30 20:47:38 2019 GMT', 'subject': ((('countryName', 'NL'),),
(('stateOrProvinceName', 'GLD'),), (('localityName', 'Ede'),),
(('organizationName', 'Electricmonk'),), (('commonName', 'example.com'),)),
'issuer': ((('countryName', 'NL'),), (('stateOrProvinceName', 'GLD'),),
(('localityName', 'Ede'),), (('organizationName', 'Electricmonk'),),
(('commonName', 'example.com'),)), 'version': 3, 'serialNumber':
'CAEC89334941FD9F'}
Sending: 'Hello, world!
Closing connection

A few notes:

  • You can concatenate multiple client certificates into a single PEM file to authenticate different clients.
  • You can re-use the same cert and key on both the server and client. This way, you don't need to generate a specific client certificate. However, any clients using that certificate will require the key, and will be able to impersonate the server. There's also no way to distinguish between clients anymore.
  • You don't need to setup your own Certificate Authority and sign client certificates. You can just generate them with the above mentioned openssl command and add them to the trusted certificates file. If you no longer trust the client, just remove the certificate from the file.
  • I'm not sure if the server verifies the client certificate's expiration date.

 

by admin at June 02, 2018 10:12 AM

Steve Kemp's Blog

A brief metric-update, and notes on golang-specific metrics

My previous post briefly described the setup of system-metric collection. (At least the server-side setup required to receive the metrics submitted by various clients.)

When it came to the clients I was complaining that collectd was too heavyweight, as installing it pulled in a ton of packages. A kind twitter user pointed out that you can get most of the stuff you need via the use of the of collectd-core package:

 # apt-get install collectd-core

I guess I should have known that! So for the moment that's what I'm using to submit metrics from my hosts. In the future I will spend more time investigating telegraf, and other "modern" solutions.

Still with collectd-core installed we've got the host-system metrics pretty well covered. Some other things I've put together also support metric-submission, so that's good.

I hacked up a quick package for automatically submitting metrics to a remote server, specifically for golang applications. To use it simply add an import to your golang application:

  import (
    ..
    _ "github.com/skx/golang-metrics"
    ..
  )

Add the import, and rebuild your application and that's it! Configuration is carried out solely via environmental variables, and the only one you need to specify is the end-point for your metrics host:

$ METRICS=metrics.example.com:2003 ./foo

Now your application will be running as usual and will also be submitting metrics to your central host every 10 seconds or so. Metrics include the number of running goroutines, application-uptime, and memory/cpu stats.

I've added a JSON-file to import as a grafana dashboard, and you can see an example of what it looks like there too:

June 02, 2018 09:01 AM

June 01, 2018

Everything Sysadmin

I'm hiring SREs/sysadmins and more!

My team at Stack Overflow has a number of open positions on the team that I manage:

  1. Windows-focused SRE, New York City or Northern NJ: If you love PowerShell, you'll love this position! If your background is more sysadmin than SRE, we're willing to train.
  2. Linux SRE, Denver: This will be particularly exciting because we're about to make some big technology changes and it will be a great opportunity to learn and grow with the company.

We have a number of other openings around the company:

  1. Junior Technology Concierge (IT Help Desk): New York City
  2. Engineering Manager: Remote (US East Coast Time Zone)
  3. VP of Engineering: New York City. (This person will be my boss!)
  4. Technical Recruiter: New York, NY
  5. If you like working in the Windows developer ecosystem (C#, ASP.NET, and MS SQL Server), we have two such developer positions: web developer and internal apps developer.

Those (and more!) open positions are listed here: https://stackoverflow.com/work-here

by Tom Limoncelli at June 01, 2018 07:49 PM

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – May 2018

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts based on last
month’s visitor data  (excluding other monthly or annual round-ups):
  1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010 – much ancient!) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our research on developing security monitoring use cases here – and that we UPDATED FOR 2018.
  2. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009 (oh, wow, ancient history!). Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 
  3. Simple Log Review Checklist Released!” is often at the top of this list – this rapidly aging checklist is still a useful tool for many people. “On Free Log Management Tools” (also aged quite a bit by now) is a companion to the checklist (updated version)
  4. Updated With Community Feedback SANS Top 7 Essential Log Reports DRAFT2” is about top log reports project of 2008-2013, I think these are still very useful in response to “what reports will give me the best insight from my logs?”
  5. Again, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book  – note that this series is even mentioned in some PCI Council materials.
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has more than 7X of the traffic of this blog]: 

Critical reference posts:
Current research on SIEM, SOC, etc

Just finished research on testing security:
Just finished research on threat detection “starter kit”
Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017.

Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Other posts in this endless series:

by Anton Chuvakin (anton@chuvakin.org) at June 01, 2018 04:04 PM

May 31, 2018

Errata Security

The First Lady's bad cyber advice

First Lady Melania Trump announced a guide to help children go online safely. It has problems.

Melania's guide is full of outdated, impractical, inappropriate, and redundant information. But that's allowed, because it relies upon moral authority: to be moral is to be secure, to be moral is to do what the government tells you. It matters less whether the advice is technically accurate, and more that you are supposed to do what authority tells you.

That's a problem, not just with her guide, but most cybersecurity advice in general. Our community gives out advice without putting much thought into it, because it doesn't need thought. You should do what we tell you, because being secure is your moral duty.

This post picks apart Melania's document. The purpose isn't to fine-tune her guide and make it better. Instead, the purpose is to demonstrate the idea of resting on moral authority instead of technical authority.
<-- --="" more="">

Strong Passwords



"Strong passwords" is the quintessential cybersecurity cliché that insecurity is due to some "weakness" (laziness, ignorance, greed, etc.) and the remedy is to be "strong".

The first flaw is that this advice is outdated. Ten years ago, important websites would frequently get hacked and have poor password protection (like MD5 hashing). Back then, strength mattered, to stop hackers from brute force guessing the hacked passwords. These days, important websites get hacked less often and protect the passwords better (like salted bcrypt). Moreover, the advice is now often redundant: websites, at least the important ones, enforce a certain level of password complexity, so that even without advice, you'll be forced to do the right thing most of the time.

This advice is outdated for a second reason: hackers have gotten a lot better at cracking passwords. Ten years ago, they focused on brute force, trying all possible combinations. Partly because passwords are now protected better, dramatically reducing the effectiveness of the brute force approach, hackers have had to focus on other techniques, such as the mutated dictionary and Markov chain attacks. Consequently, even though "Password123!" seems to meet the above criteria of a strong password, it'll fall quickly to a mutated dictionary attack. The simple recommendation of "strong passwords" is no longer sufficient.


The last part of the above advice is to avoid password reuse. This is good advice. However, this becomes impractical advice, especially when the user is trying to create "strong" complex passwords as described above. There's no way users/children can remember that many passwords. So they aren't going to follow that advice.

To make the advice work, you need to help users with this problem. To begin with, you need to tell them to write down all their passwords. This is something many people avoid, because they've been told to be "strong" and writing down passwords seems "weak". Indeed it is, if you write them down in an office environment and stick them on a note on the monitor or underneath the keyboard. But they are safe and strong if it's on paper stored in your home safe, or even in a home office drawer. I write my passwords on the margins in a book on my bookshelf -- even if you know that, it'll take you a long time to figure out which book when invading my home.

The other option to help avoid password reuse is to use a password manager. I don't recommend them to my own parents because that'd be just one more thing I'd have to help them with, but they are fairly easy to use. It means you need only one password for the password manager, which then manages random/complex passwords for all your web accounts.

So what we have here is outdated and redundant advice that overshadows good advice that is nonetheless incomplete and impractical. The advice is based on the moral authority of telling users to be "strong" rather than the practical advice that would help them.

No personal info unless website is secure

The guide teaches kids to recognize the difference between a secure/trustworthy and insecure website. This is laughably wrong.


HTTPS means the connection to the website is secure, not that the website is secure. These are different things. It means hackers are unlikely to be able to eavesdrop on the traffic as it's transmitted to the website. However, the website itself may be insecure (easily hacked), or worse, it may be a fraudulent website created by hackers to appear similar to a legitimate website.

What HTTPS secures is a common misconception, perpetuated by guides like this. This is the source of criticism for LetsEncrypt, an initiative to give away free website certificates so that everyone can get HTTPS. Hackers now routinely use LetsEncrypt to create their fraudulent websites to host their viruses. Since people have been taught forever that HTTPS means a website is "secure", people are trusting these hacker websites.

But LetsEncrypt is a good thing, all connections should be secure. What's bad is not LetsEncrypt itself, but guides like this from the government that have for years been teaching people the wrong thing, that HTTPS means a website is secure.

Backups

Of course, no guide would be complete without telling people to backup their stuff.


This is especially important with the growing ransomware threat. Ransomware is a type of virus/malware that encrypts your files then charges you money to get the key to decrypt the files. Half the time this just destroys the files.

But this again is moral authority, telling people what to do, instead of educating them how to do it. Most will ignore this advice because they don't know how to effectively backup their stuff.

For most users, it's easy to go to the store and buy a 256-gigabyte USB drive for $40 (as of May 2018) then use the "Timemachine" feature in macOS, or on Windows the "File History" feature or the "Backup and Restore" feature. These can be configured to automatically do the backup on a regular basis so that you don't have to worry about it.

But such "local" backups are still problematic. If the drive is left plugged into the machine, ransomeware can attack the backup. If there's a fire, any backup in your home will be destroyed along with the computer.

I recommend cloud backup instead. There are so many good providers, like DropBox, Backblaze, Microsoft, Apple's iCloud, and so on. These are especially critical for phones: if your iPhone is destroyed or stolen, you can simply walk into an Apple store and buy a new one, with everything replaced as it was from their iCloud.

But all of this is missing the key problem: your photos. You carry a camera with you all the time now and take a lot of high resolution photos. This quickly exceeds the capacity of most of the free backup solutions. You can configure these, such as you phone's iCloud backup, to exclude photos, but that means you are prone to losing your photos/memories. For example, Drop Box is great for the free 5 gigabyte service, but if I want to preserve photos on it, I have to pay for their more expensive service.

One of the key messages kids should learn about photos is that they will likely lose most all of the photos they've taken within 5 years. The exceptions will be the few photos they've posted to social media, which sorta serves as a cloud backup for them. If they want to preserve the rest of these memories, the kids need to take seriously finding backup solutions. I'm not sure of the best solution, but I buy big USB flash drives and send them to my niece asking her to copy all her photos to them, so that at least I can put that in a safe.

One surprisingly good solution is Microsoft Office 365. For $99 a year, you get a copy of their Office software (which I use) but it also comes with a large 1-terabyte of cloud storage, which is likely big enough for your photos. Apple charges around the same amount for 1-terabyte of iCloud, though it doesn't come with a free license for Microsoft Office :-).

WiFi encryption

Your home WiFi should be encrypted, of course.


I have to point out the language, though. Turning on WPA2 WiFi encryption does not "secure your network". Instead, it just secures the radio signals from being eavesdropped. Your network may have other vulnerabilities, where encryption won't help, such as when your router has remote administration turned on with a default or backdoor password enabled.

I'm being a bit pedantic here, but it's not my argument. It's the FTC's argument when they sued vendors like D-Link for making exactly the same sort of recommendation. The FTC claimed it was deceptive business practice because recommending users do things like this still didn't mean the device was "secure". Since the FTC is partly responsible for writing Melania's document, I find this a bit ironic.

In any event, WPA2 personal has problems where it can be hacked, such as if WPS is enabled, or evil twin access-points broadcasting stronger (or more directional) signals. It's thus insufficient security. To be fully secure against possible WiFi eavesdropping you need to enable enterprise WPA2, which isn't something most users can do.

Also, WPA2 is largely redundant. If you wardrive your local neighborhood you'll find that almost everyone has WPA enabled already anyway. Guides like this probably don't need to advise what everyone's already doing, especially when it's still incomplete.

Change your router password


Yes, leaving the default password on your router is a problem, as shown by recent Mirai-style attacks, such as the very recent ones where Russia has infected 500,000 in their cyberwar against Ukraine. But those were only a problem because routers also had remote administration enabled. It's remote administration you need to make sure is disabled on your router, regardless if you change the default password (as there are other vulnerabilities besides passwords). If remote administration is disabled, then it's very rare that people will attack your router with the default password.

Thus, they ignore the important thing (remote administration) and instead focus on the less important thing (change default password).

In addition, this advice again the impractical recommendation of choosing a complex (strong) password. Users who do this usually forget it by the time they next need it. Practical advice is to recommend users write down the password they choose, and put it either someplace they won't forget (like with the rest of their passwords), or on a sticky note under the router.

Update router firmware

Like any device on the network, you should keep it up-to-date with the latest patches. But you aren't going to, because it's not practical. While your laptop/desktop and phone nag you about updates, your router won't. Whereas phones/computers update once a month, your router vendor will update the firmware once a year -- and after a few years, stop releasing any more updates at all.

Routers are just one of many IoT devices we are going to have to come to terms with, keeping them patched. I don't know the right answer. I check my parents stuff every Thanksgiving, so maybe that's a good strategy: patch your stuff at the end of every year. Maybe some cultural norms will develop, but simply telling people to be strong about their IoT firmware patches isn't going to be practical in the near term.

Don't click on stuff

This probably the most common cybersecurity advice given by infosec professionals. It is wrong.


Emails/messages are designed for you to click on things. You regularly get emails/messages from legitimate sources that demand you click on things. It's so common from legitimate sources that there's no practical way for users to distinguish between them and bad sources. As that Google Docs bug showed, even experts can't always tell the difference.

I mean, it's true that phishing attacks coming through emails/messages try to trick you into clicking on things, and you should be suspicious of such things. However, it doesn't follow from this that not clicking on things is a practical strategy. It's like diet advice recommending you stop eating food altogether.

Sex predators, oh my!

Of course, its kids going online, so of course you are going to have warnings about sexual predators:


But online predators are rare. The predator threat to children is overwhelmingly from relatives and acquaintances, a much smaller threat from strangers, and a vanishingly tiny threat from online predators. Recommendations like this stem from our fears of the unknown technology rather than a rational measurement of the threat.

Sexting, oh my!

So here is one piece of advice that I can agree with: don't sext:


But the reason this is bad is not because it's immoral or wrong, but because adults have gone crazy and made it illegal for children to take nude photographs of themselves. As this article points out, your child is more likely to get in trouble and get placed on the sex offender registry (for life) than to get molested by a person on that registry.

Thus, we need to warn kids not from some immoral activity, but from adults who've gotten freaked out about it. Yes, sending pictures to your friends/love-interest will also often get you in trouble as those images will frequently get passed around school, but such temporary embarrassments will pass. Getting put on a sex offender registry harms you for life.

Texting while driving

Finally, I want to point out this error:


The evidence is to the contrary, that it's not actually dangerous -- it's just assumed to be dangerous. Texting rarely distracts drivers from what's going on the road. It instead replaces some other inattention, such as day dreaming, fiddling with the radio, or checking yourself in the mirror. Risk compensation happens, when people are texting while driving, they are also slowing down and letting more space between them and the car in front of them.

Studies have shown this. For example, one study measured accident rates at 6:59pm vs 7:01pm and found no difference. That's when "free evening texting" came into effect, so we should've seen a bump in the number of accidents. They even tried to narrow the effect down, such as people texting while changing cell towers (proving they were in motion).

Yes, texting is illegal, but that's because people are fed up with the jerk in front of them not noticing the light is green. It's not illegal because it's particularly dangerous, that it has a measurable impact on accident rates.

Conclusion

The point of this post is not to refine the advice and make it better. Instead, I attempt to demonstrate how such advice rests on moral authority, because it's the government telling you so. It's because cybersecurity and safety are higher moral duties. Much of it is outdated, impractical, inappropriate, and redundant.

We need to move away from this sort of advice. Instead of moral authority, we need technical authority. We need to focus on the threats that people actually face, and instead of commanding them what to do. We need to help them be secure, not command to command them, shaming them for their insecurity. It's like Strunk and White's "Elements of Style": they don't take the moral authority approach and tell people how to write, but instead try to help people how to write well.

by Robert Graham (noreply@blogger.com) at May 31, 2018 09:06 PM

May 30, 2018

Errata Security

masscan, macOS, and firewall

One of the more useful features of masscan is the "--banners" check, which connects to the TCP port, sends some request, and gets a basic response back. However, since masscan has it's own TCP stack, it'll interfere with the operating system's TCP stack if they are sharing the same IPv4 address. The operating system will reply with a RST packet before the TCP connection can be established.

The way to fix this is to use the built-in packet-filtering firewall to block those packets in the operating-system TCP/IP stack. The masscan program still sees everything before the packet-filter, but the operating system can't see anything after the packet-filter.


Note that we are talking about the "packet-filter" firewall feature here. Remember that macOS, like most operating systems these days, has two separate firewalls: an application firewall and a packet-filter firewall. The application firewall is the one you see in System Settings labeled "Firewall", and it controls things based upon the application's identity rather than by which ports it uses. This is normally "on" by default. The packet-filter is normally "off" by default and is of little use to normal users.

Also note that macOS changed packet-filters around version 10.10.5 ("Yosemite", October 2014). The older one is known as "ipfw", which was the default firewall for FreeBSD (much of macOS is based on FreeBSD). The replacement is known as PF, which comes from OpenBSD. Whereas you used to use the old "ipfw" command on the command line, you now use the "pfctl" command, as well as the "/etc/pf.conf" configuration file.

What we need to filter is the source port of the packets that masscan will send, so that when replies are received, they won't reach the operating-system stack, and just go to masscan instead. To do this, we need find a range of ports that won't conflict with the operating system. Namely, when the operating system creates outgoing connections, it randomly chooses a source port within a certain range. We want to use masscan to use source ports in a different range.

To figure out the range macOS uses, we run the following command:

sysctl net.inet.ip.portrange.first net.inet.ip.portrange.last

On my laptop, which is probably the default for macOS, I get the following range. Sniffing with Wireshark confirms this is the range used for source ports for outgoing connections.

net.inet.ip.portrange.first: 49152
net.inet.ip.portrange.last: 65535

So this means I shouldn't use source ports anywhere in the range 49152 to 65535. On my laptop, I've decided to use for masscan the ports 40000 to 41023. The range masscan uses must be a power of 2, so here I'm using 1024 (two to the tenth power).

To configure masscan, I can either type the parameter "--source-port 40000-41023" every time I run the program, or I can add the following line to /etc/masscan/masscan.conf. Remember that by default, masscan will look in that configuration file for any configuration parameters, so you don't have to keep retyping them on the command line.

source-port = 40000-41023

Next, I need to add the following firewall rule to the bottom of /etc/pf.conf:

block in proto tcp from any to any port 40000 >< 41024

However, we aren't done yet. By default, the packet-filter firewall is off on some versions of macOS. Therefore, every time you reboot your computer, you need to enable it. The simple way to do this is on the command line run:

pfctl -e

Or, if that doesn't work, try:

pfctl -E

If the firewall is already running, then you'll need to load the file explicitly (or reboot):

pfctl -f /etc/pf.conf

You can check to see if the rule is active:

pfctl -s rules



by Robert Graham (noreply@blogger.com) at May 30, 2018 09:50 PM

May 24, 2018

The Lone Sysadmin

vSphere 6.7 Will Not Run In My Lab: A Parable

“Hey Bob, I tried installing vSphere 6.7 on my lab servers and it doesn’t work right. You tried using it yet? Been beating my head against a wall here.” “Yeah, I really like it. A lot. Like, resisting the urge to be irresponsible and upgrade everything. What are your lab servers?” I knew what he […]

The post vSphere 6.7 Will Not Run In My Lab: A Parable appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at May 24, 2018 03:12 PM

May 23, 2018

Evaggelos Balaskas

CentOS Bootstrap

CentOS 6

This way is been suggested for building a container image from your current centos system.

 

In my case, I need to remote upgrade a running centos6 system to a new clean centos7 on a test vps, without the need of opening the vnc console, attaching a new ISO etc etc.

I am rather lucky as I have a clean extra partition to this vps, so I will follow the below process to remote install a new clean CentOS 7 to this partition. Then add a new grub entry and boot into this partition.

 

Current OS

# cat /etc/redhat-release
CentOS release 6.9 (Final)

 

Format partition

format & mount the partition:

 mkfs.ext4 -L rootfs /dev/vda5
 mount /dev/vda5 /mnt/

 

InstallRoot

Type:

# yum -y groupinstall "Base" --releasever 7 --installroot /mnt/ --nogpgcheck

 

Test

test it, when finished:

mount --bind /dev/  /mnt/dev/
mount --bind /sys/  /mnt/sys/
mount --bind /proc/ /mnt/proc/

chroot /mnt/

bash-4.2#  cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)

It works!

 

Root Password

inside chroot enviroment:

bash-4.2# passwd
Changing password for user root.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

bash-4.2# exit

 

Grub

adding the new grub entry for CentOS 7

title CentOS 7
        root (hd0,4)
        kernel /boot/vmlinuz-3.10.0-862.2.3.el7.x86_64 root=/dev/vda5 ro rhgb LANG=en_US.UTF-8
        initrd /boot/initramfs-3.10.0-862.2.3.el7.x86_64.img

by changing the default boot entry from 0 to 1 :

default=0

to

default=1

our system will boot into centos7 when reboot!

 

May 23, 2018 08:28 PM

syslog.me

Concurrency in Go

In my quest to learn the Go language I am currently in the process of doing the Go Code Clinic. It’s taking me quite some time because instead of going through the solutions proposed in the course I try to implement a solution by myself; only when I have no idea whatsoever about how to proceed I peep into the solution to get some insight, and then work independently on my solution again.

The second problem in the Clinic is already at a non-trivial level: compare a number of  images with a bigger image to check if any of those is a “clipping” of the bigger one. I confess that I would have a lot to read and to work even if I was trying to solve it in Perl!

It took some banging of my head to the wall till I eventually solved the problem. Unfortunately my program is single-threaded and the process of matching images is very expensive. For example, it took more than two hours to match a clipping sized 967×562 pixels with it’s “base” image sized 2048×1536. And for the whole time only one CPU thread was running 100%, the others where barely used.If I really want to say that I solved the problem I must  adapt the program to the available computing power by starting a number of subprocesses/threads (in our case: goroutines) to distribute the search across several CPU threads.

Since this was completely new to me in golang, I decided to experiment with a much simpler program: generate up to 100 random integers (say) between 0 and 10000 and run 8 workers to find if any of these random numbers is a multiple of another number, for example 17. And of course the program must shut down gracefully, whether or not a multiple is found. This gave me a few problems to solve:

  • how do I start exactly 8 worker goroutines?
  • what’s the best way to pass them the numbers to check? what’s the best way for them to report back the result?
  • how do I tell them to stop when it’s time that they shut down?
  • how do I wait that they are actually shut down?

The result is the go program that you can find in this gist. Assuming that it is good enough, you can use it as a skeleton for a program of yours, re-implementing the worker part and maybe the reaper part if a boolean response is not enough. Enjoy!

by bronto at May 23, 2018 03:34 PM

The Lone Sysadmin

Midnight is a Confusing Choice for Scheduling

Midnight is a poor choice for scheduling anything. Midnight belongs to tomorrow. It’s 0000 on the clock, which is the beginning of the next day. That’s not how humans think, though, because tomorrow is after we wake up! A great example is a statement like “proposals are due by midnight on April 15.” What you […]

The post Midnight is a Confusing Choice for Scheduling appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at May 23, 2018 02:54 PM

syslog.me

Lightning Calendar, Google and “foreign” addresses

This is written to my older self, and to all those using Mozilla Thunderbird and the Lightning Calendar add-on with Google calendars and they see this:

Lightning-Error-foreign-calendar

If you are seeing this, the solution is to change the setting calendar.google.enableEmailInvitations to true:

Lightning-foreign-calendar-setting

and everything should work as expected:

Lightning-working-event

Enjoy!

 

by bronto at May 23, 2018 01:19 PM

Vincent Bernat

Multi-tier load-balancing with Linux

A common solution to provide a highly-available and scalable service is to insert a load-balancing layer to spread requests from users to backend servers.1 We usually have several expectations for such a layer:

scalability
It allows a service to scale by pushing traffic to newly provisioned backend servers. It should also be able to scale itself when it becomes the bottleneck.
availability
It provides high availability to the service. If one server becomes unavailable, the traffic should be quickly steered to another server. The load-balancing layer itself should also be highly available.
flexibility
It handles both short and long connections. It is flexible enough to offer all the features backends generally expect from a load-balancer like TLS or HTTP routing.
operability
With some cooperation, any expected change should be seamless: rolling out a new software on the backends, adding or removing backends, or scaling up or down the load-balancing layer itself.

The problem and its solutions are well known. From recently published articles on the topic, “Introduction to modern network load-balancing and proxying” provides an overview of the state of the art. Google released “Maglev: A Fast and Reliable Software Network Load Balancer” describing their in-house solution in details.2 However, the associated software is not available. Basically, building a load-balancing solution with commodity servers consists of assembling three components:

  • ECMP routing
  • stateless L4 load-balancing
  • stateful L7 load-balancing

In this article, I describe and support a multi-tier solution using Linux and only open-source components. It should offer you the basis to build a production-ready load-balancing layer.

Update (2018.05)

Facebook just released Katran, an L4 load-balancer implemented with XDP and eBPF and using consistent hashing. It could be inserted in the configuration described below.

Last tier: L7 load-balancing🔗

Let’s start with the last tier. Its role is to provide high availability, by forwarding requests to only healthy backends, and scalability, by spreading requests fairly between them. Working in the highest layers of the OSI model, it can also offer additional services, like TLS-termination, HTTP routing, header rewriting, rate-limiting of unauthenticated users, and so on. Being stateful, it can leverage complex load-balancing algorithm. Being the first point of contact with backend servers, it should ease maintenances and minimize impact during daily changes.

L7 load-balancers
The last tier of the load-balancing solution is a set of L7 load-balancers receiving user connections and forwarding them to the backends.

It also terminates client TCP connections. This introduces some loose coupling between the load-balancing components and the backend servers with the following benefits:

  • connections to servers can be kept open for lower resource use and latency,
  • requests can be retried transparently in case of failure,
  • clients can use a different IP protocol than servers, and
  • servers do not have to care about path MTU discovery, TCP congestion control algorithms, avoidance of the TIME-WAIT state and various other low-level details.

Many pieces of software would fit in this layer and an ample literature exists on how to configure them. You could look at HAProxy, Envoy or Træfik. Here is a configuration example for HAProxy:

# L7 load-balancer endpoint
frontend l7lb
  # Listen on both IPv4 and IPv6
  bind :80 v4v6
  # Redirect everything to a default backend
  default_backend servers
  # Healthchecking
  acl dead nbsrv(servers) lt 1
  acl disabled nbsrv(enabler) lt 1
  monitor-uri /healthcheck
  monitor fail if dead || disabled

# IPv6-only servers with HTTP healthchecking and remote agent checks
backend servers
  balance roundrobin
  option httpchk
  server web1 [2001:db8:1:0:2::1]:80 send-proxy check agent-check agent-port 5555
  server web2 [2001:db8:1:0:2::2]:80 send-proxy check agent-check agent-port 5555
  server web3 [2001:db8:1:0:2::3]:80 send-proxy check agent-check agent-port 5555
  server web4 [2001:db8:1:0:2::4]:80 send-proxy check agent-check agent-port 5555

# Fake backend: if the local agent check fails, we assume we are dead
backend enabler
  server enabler [::1]:0 agent-check agent-port 5555

This configuration is the most incomplete piece of this guide. However, it illustrates two key concepts for operability:

  1. Healthchecking of the web servers is done both at HTTP-level (with check and option httpchk) and using an auxiliary agent check (with agent-check). The later makes it easy to put a server to maintenance or to orchestrate a progressive rollout. On each backend, you need a process listening on port 5555 and reporting the status of the service (UP, DOWN, MAINT). A simple socat process can do the trick:3

    socat -ly \
      TCP6-LISTEN:5555,ipv6only=0,reuseaddr,fork \
      OPEN:/etc/lb/agent-check,rdonly
    

    Put UP in /etc/lb/agent-check when the service is in nominal mode. If the regular healthcheck is also positive, HAProxy will send requests to this node. When you need to put it in maintenance, write MAINT and wait for the existing connections to terminate. Use READY to cancel this mode.

  2. The load-balancer itself should provide an healthcheck endpoint (/healthcheck) for the upper tier. It will return a 503 error if either there is no backend servers available or if put down the enabler backend through the agent check. The same mechanism as for regular backends can be used to signal the unavailability of this load-balancer.

Additionally, the send-proxy directive enables the proxy protocol to transmit the real clients’ IP addresses. This protocol also works for non-HTTP connections and is supported by a variety of servers, including nginx:

http {
  server {
    listen [::]:80 default ipv6only=off proxy_protocol;
    root /var/www;
    set_real_ip_from ::/0;
    real_ip_header proxy_protocol;
  }
}

As is, this solution is not complete. We have just moved the availability and scalability problem somewhere else. How do we load-balance the requests between the load-balancers?

First tier: ECMP routing🔗

On most modern routed IP networks, redundant paths exist between clients and servers. For each packet, routers have to choose a path. When the cost associated to each path is equal, incoming flows4 are load-balanced among the available destinations. This characteristic can be used to balance connections among available load-balancers:

ECMP routing
ECMP routing is used as a first tier. Flows are spread among available L7 load-balancers. Routing is stateless and asymmetric. Backend servers are not represented.

There is little control over the load-balancing but ECMP routing brings the ability to scale horizontally both tiers. A common way to implement such a solution is to use BGP, a routing protocol to exchange routes between network equipments. Each load-balancer announces to its connected routers the IP addresses it is serving.

If we assume you already have BGP-enabled routers available, ExaBGP is a flexible solution to let the load-balancers advertise their availability. Here is a configuration for one of the load-balancers:

# Healthcheck for IPv6
process service-v6 {
  run python -m exabgp healthcheck -s --interval 10 --increase 0 --cmd "test -f /etc/lb/v6-ready -a ! -f /etc/lb/disable";
  encoder text;
}

template {
  # Template for IPv6 neighbors
  neighbor v6 {
    router-id 192.0.2.132;
    local-address 2001:db8::192.0.2.132;
    local-as 65000;
    peer-as 65000;
    hold-time 6;
    family {
      ipv6 unicast;
    }
    api services-v6 {
      processes [ service-v6 ];
    }
  }
}

# First router
neighbor 2001:db8::192.0.2.254 {
  inherit v6;
}

# Second router
neighbor 2001:db8::192.0.2.253 {
  inherit v6;
}

If /etc/lb/v6-ready is present and /etc/lb/disable is absent, all the IP addresses configured on the lo interface will be announced to both routers. If the other load-balancers use a similar configuration, the routers will distribute incoming flows between them. Some external process should manage the existence of the /etc/lb/v6-ready file by checking for the healthiness of the load-balancer (using the /healthcheck endpoint for example). An operator can remove a load-balancer from the rotation by creating the /etc/lb/disable file.

To get more details on this part, have a look at “High availability with ExaBGP.” If you are in the cloud, this tier is usually implemented by your cloud provider, either using an anycast IP address or a basic L4 load-balancer.

Unfortunately, this solution is not resilient when an expected or unexpected change happens. Notably, when adding or removing a load-balancer, the number of available routes for a destination changes. The hashing algorithm used by routers is not consistent and flows are reshuffled among the available load-balancers, breaking existing connections:

Stability of ECMP routing 1/2
ECMP routing is unstable when a change happens. An additional load-balancer is added to the pool and the flows are routed to different load-balancers, which do not have the appropriate entries in their connection tables.

Moreover, each router may choose its own routes. When a router becomes unavailable, the second one may route the same flows differently:

Stability of ECMP routing 2/2
A router becomes unavailable and the remaining router load-balances its flows differently. One of them is routed to a different load-balancer, which do not have the appropriate entry in its connection table.

If you think this is not an acceptable outcome, notably if you need to handle long connections like file downloads, video streaming or websocket connections, you need an additional tier. Keep reading!

Second tier: L4 load-balancing🔗

The second tier is the glue between the stateless world of IP routers and the stateful land of L7 load-balancing. It is implemented with L4 load-balancing. The terminology can be a bit confusing here: this tier routes IP datagrams (no TCP termination) but the scheduler uses both destination IP and port to choose an available L7 load-balancer. The purpose of this tier is to ensure all members take the same scheduling decision for an incoming packet.

There are two options:

  • stateful L4 load-balancing with state synchronization accross the members, or
  • stateless L4 load-balancing with consistent hashing.

The first option increases complexity and limits scalability. We won’t use it.5 The second option is less resilient during some changes but can be enhanced with an hybrid approach using a local state.

We use IPVS, a performant L4 load-balancer running inside the Linux kernel, with Keepalived, a frontend to IPVS with a set of healthcheckers to kick out an unhealthy component. IPVS is configured to use the Maglev scheduler, a consistent hashing algorithm from Google. Among its family, this is a great algorithm because it spreads connections fairly, minimizes disruptions during changes and is quite fast at building its lookup table. Finally, to improve performance, we let the last tier—the L7 load-balancers—sends back answers directly to the clients without involving the second tier—the L4 load-balancers. This is referred to as direct server return (DSR) or direct routing (DR).

Second tier: L4 load-balancing
L4 load-balancing with IPVS and consistent hashing as a glue between the first tier and the third tier. Backend servers have been omitted. Dotted lines represent the path for the return packets.

With such a setup, we expect packets from a flow to be able to move freely between the components of the two first tiers while sticking to the same L7 load-balancer.

Configuration🔗

Assuming ExaBGP has already been configured like described in the previous section, let’s start with the configuration of Keepalived:

virtual_server_group VS_GROUP_MH_IPv6 {
  2001:db8::198.51.100.1 80
}
virtual_server group VS_GROUP_MH_IPv6 {
  lvs_method TUN  # Tunnel mode for DSR
  lvs_sched mh    # Scheduler: Maglev
  sh-port         # Use port information for scheduling
  protocol TCP
  delay_loop 5
  alpha           # All servers are down on start
  omega           # Execute quorum_down on shutdown
  quorum_up   "/bin/touch /etc/lb/v6-ready"
  quorum_down "/bin/rm -f /etc/lb/v6-ready"

  # First L7 load-balancer
  real_server 2001:db8::192.0.2.132 80 {
    weight 1
    HTTP_GET {
      url {
        path /healthcheck
        status_code 200
      }
      connect_timeout 2
    }
  }

  # Many others...
}

The quorum_up and quorum_down statements define the commands to be executed when the service becomes available and unavailable respectively. The /etc/lb/v6-ready file is used as a signal to ExaBGP to advertise the service IP address to the neighbor routers.

Additionally, IPVS needs to be configured to continue routing packets from a flow moved from another L4 load-balancer. It should also continue routing packets from unavailable destinations to ensure we can drain properly a L7 load-balancer.

# Schedule non-SYN packets
sysctl -qw net.ipv4.vs.sloppy_tcp=1
# Do NOT reschedule a connection when destination
# doesn't exist anymore
sysctl -qw net.ipv4.vs.expire_nodest_conn=0
sysctl -qw net.ipv4.vs.expire_quiescent_template=0

The Maglev scheduling algorithm will be available with Linux 4.18, thanks to Inju Song. For older kernels, I have prepared a backport.6 Use of source hashing as a scheduling algorithm will hurt the resilience of the setup.

DSR is implemented using the tunnel mode. This method is compatible with routed datacenters and cloud environments. Requests are tunneled to the scheduled peer using IPIP encapsulation. It adds a small overhead and may lead to MTU issues. If possible, ensure you are using a larger MTU for communication between the second and the third tier.7 Otherwise, it is better to explicitely allow fragmentation of IP packets:

sysctl -qw net.ipv4.vs.pmtu_disc=0

You also need to configure the L7 load-balancers to handle encapsulated traffic:8

# Setup IPIP tunnel to accept packets from any source
ip tunnel add tunlv6 mode ip6ip6 local 2001:db8::192.0.2.132
ip link set up dev tunlv6
ip addr add 2001:db8::198.51.100.1/128 dev tunlv6

Evaluation of the resilience🔗

As configured, the second tier increases the resilience of this setup for two reasons:

  1. The scheduling algorithm is using a consistent hash to choose its destination. Such an algorithm reduces the negative impact of expected or unexpected changes by minimizing the number of flows moving to a new destination. “Consistent Hashing: Algorithmic Tradeoffs” offers more details on this subject.

  2. IPVS keeps a local connection table for known flows. When a change impacts only the third tier, existing flows will be correctly directed according to the connection table.

If we add or remove a L4 load-balancer, existing flows are not impacted because each load-balancer takes the same decision, as long as they see the same set of L7 load-balancers:

L4 load-balancing instability 1/3
Loosing a L4 load-balancer has no impact on existing flows. Each arrow is an example of flow. The dots are flow endpoints bound to the associated load-balancer. If they had moved to another load-balancer, connection would have been lost.

If we add a L7 load-balancer, existing flows are not impacted either because only new connections will be scheduled to it. For existing connections, IPVS will look at its local connection table and continue to forward packets to the original destination. Similarly, if we remove a L7 load-balancer, only existing flows terminating at this load-balancer are impacted. Other existing connections will be forwarded correctly:

L4 load-balancing instability 2/3
Loosing a L7 load-balancer only impacts the flows bound to it.

We need to have simultaneous changes on both levels to get a noticeable impact. For example, when adding both a L4 load-balancer and a L7 load-balancer, only connections moved to a L4 load-balancer without state and scheduled to the new load-balancer will be broken. Thanks to the consistent hashing algorithm, other connections will stay bound to the right L7 load-balancer. During a planned change, this disruption can be minimized by adding the new L4 load-balancers first, waiting a few minutes, then adding the new L7 load-balancers.

L4 load-balancing instability 3/3
Both a L4 load-balancer and a L7 load-balancer come back to life. The consistent hash algorithm ensures that only one fifth of the existing connections would be moved to the incoming L7 load-balancer. Some of them continue to be routed through their original L4 load-balancer, which mitigates the impact.

Additionally, IPVS correctly routes ICMP messages to the same L7 load-balancers as the associated connections. This ensures notably path MTU discovery works and there is no need for smart workarounds.

Tier 0: DNS load-balancing🔗

Optionally, you can add DNS load-balancing to the mix. This is useful either if your setup is spanned accross multiple datacenters, or multiple cloud regions, or if you want to break a large load-balancing cluster into smaller ones. It is not intended to replace the first tier as it doesn’t share the same characteristics: load-balancing is unfair (it is not flow-based) and recovery from a failure is slow.

Complete load-balancing solution
A complete load-balancing solution spanning accross two datacenters.

gdnsd is an authoritative-only DNS server with integrated healthchecking. It can serve zones from master files using the RFC 1035 zone format:

@ SOA ns1 ns1.example.org. 1 7200 1800 259200 900
@ NS ns1.example.com.
@ NS ns1.example.net.
@ MX 10 smtp

@     60 DYNA multifo!web
www   60 DYNA multifo!web
smtp     A    198.51.100.99

The special RR type DYNA will return A and AAAA records after querying the specified plugin. Here, the multifo plugin implements an all-active failover of monitored addresses:

service_types => {
  web => {
    plugin => http_status
    url_path => /healthcheck
    down_thresh => 5
    interval => 5
  }
  ext => {
    plugin => extfile
    file => /etc/lb/ext
    def_down => false
  }
}

plugins => {
  multifo => {
    web => {
      service_types => [ ext, web ]
      addrs_v4 => [ 198.51.100.1, 198.51.100.2 ]
      addrs_v6 => [ 2001:db8::198.51.100.1, 2001:db8::198.51.100.2 ]
    }
  }
}

In nominal state, an A request will be answered with both 198.51.100.1 and 198.51.100.2. An healthcheck failure will update the returned set accordingly. It is also possible to administratively remove an entry by modifying the /etc/lb/ext file. For example, with the following content, 198.51.100.2 will not be advertised anymore:

198.51.100.1 => UP
198.51.100.2 => DOWN
2001:db8::c633:6401 => UP
2001:db8::c633:6402 => UP

You can find all the configuration files and the setup of each tier in the GitHub repository. If you want to replicate this setup at a smaller scale, it is possible to collapse the second and the third tiers by using either localnode or network namespaces. Even if you don’t need its fancy load-balancing services, you should keep the last tier: while backend servers come and go, the L7 load-balancers bring stability, which translates to resiliency.


  1. In this article, “backend servers” are the servers behind the load-balancing layer. To avoid confusion, we will not use the term “frontend.” ↩︎

  2. A good summary of the paper is available from Adrian Colyer. From the same author, you may also have a look at the summary for “Stateless datacenter load-balancing with Beamer.” ↩︎

  3. If you feel this solution is fragile, feel free to develop your own agent. It could coordinate with a key-value store to determine the wanted state of the server. It is possible to centralize the agent in a single location, but you may get a chicken-and-egg problem to ensure its availability. ↩︎

  4. A flow is usually determined by the source and destination IP and the L4 protocol. Alternatively, the source and destination port can also be used. The router hashes these information to choose the destination. For Linux, you may find more information on this topic in “Celebrating ECMP in Linux.” ↩︎

  5. On Linux, it can be implemented by using Netfilter for load-balancing and conntrackd to synchronize state. IPVS only provides active/backup synchronization. ↩︎

  6. The backport is not strictly equivalent to its original version. Be sure to check the README file to understand the differences. Briefly, in Keepalived configuration, you should:

    • not use inhibit_on_failure
    • use sh-port
    • not use sh-fallback

    ↩︎

  7. At least 1520 for IPv4 and 1540 for IPv6. ↩︎

  8. As is, this configuration is a insecure. You need to ensure only the L4 load-balancers will be able to send IPIP traffic. ↩︎

by Vincent Bernat at May 23, 2018 08:59 AM

May 22, 2018

SysAdmin1138

Database schemas vs. identity

Yesterday brought this tweet up:

This is amazingly bad wording, and is the kind of thing that made the transpeople in my timeline (myself included) go "Buwhuh?" and me to wonder if this was a snopes worthy story.

No, actually.

The key phrase here is, "submit your prints for badges".

There are two things you should know:

  1. NASA works on National Security related things, which requires a security clearance to work on, and getting one of those requires submitting prints.
  2. The FBI is the US Government's authority in handling biometric data

Here is a chart from the Electronic Biometric Transmission Specification, which describes a kind of API for dealing with biometric data.

If Following Condition ExistsEnter Code
Subject's gender reported as femaleF
Occupation or charge indicated "Male Impersonator"G
Subject's gender reported as maleM
Occupation or charge indicated "Female Impersonator" or transvestiteN
Male name, no gender givenY
Female name, no gender givenZ
Unknown genderX

Source: EBTS Version 10.0 Final, page 118.

Yep, it really does use the term "Female Impersonator". To a transperson living in 2016 getting their first Federal job (even as a contractor), running into these very archaic terms is extremely off-putting.

As someone said in a private channel:

This looks like some 1960's bureaucrat trying to be 'inclusive'

This is not far from the truth.

This table exists unchanged in the 7.0 version of the document, dated January 1999. Previous versions are in physical binders somewhere, and not archived on the Internet; but the changelog for the V7 document indicates that this wording was in place as early as 1995. Mention is also made of being interoperable with UK law-enforcement.

The NIST standard for fingerprints issued in 1986 mentions a SEX field, but it only has M, F, and U; later NIST standards drop this field definition entirely.

As this field was defined in standard over 20 years ago and has not been changed, is used across the full breadth of the US justice system, is referenced in International communications standards including Visa travel, and used as the basis for US Military standards, these field definitions are effectively immutable and will only change after concerted effort over decades.

This is what institutionalized transphobia looks like, and we will be dealing with it for another decade or two. If not longer.


The way to deal with this is to deprecate the codes in documentation, but still allow them as valid.

  • Create a deprecation notice in the definition of the field saying that the G and N values are to be considered deprecated and should not be used.
  • In the deprecation notice say that in the future, new records will not be accepted with those values.
  • Those values will remain valid for queries, because there are decades of legacy-coding in databases using them.

The failure-mode of this comes in with form designers who look at the spec and build forms based on the spec. Like this example from Maryland. Which means we need to let the forms designers know that the spec needs to be selectively ignored. The deprecation notice does that.

At the local level, convince your local City Council to pass resolutions to modernize their Police forms to reflect modern sensibilities, and drop the G and N codes from intake forms. Do this at the County too, for the Sheriff's department.

At the state level, convince your local representatives to push resolutions to get the State Patrol to modernize their forms likewise. Drop the G and N codes from the forms.

At the Federal employee level, there is less to be done here as you're closer to the governing standards, but you may be able to convince The Powers That Be to drop the two offensive checkboxes or items from the drop-down list.

At the Federal standard level. Lobby the decision makers that govern this standard and push for a deprecation notice. If any of your congress-people are on any Judiciary committees, you'll have more luck than most.

by SysAdmin1138 at May 22, 2018 03:35 PM

May 18, 2018

OpenSSL

New LTS Release

Back around the end of 2014 we posted our release strategy. This was the first time we defined support timelines for our releases, and added the concept of an LTS (long-term support) release. At our OMC meeting earlier this month, we picked our next LTS release. This post walks through that announcement, and tries to explain all the implications of it.

Once an official release is made, it then enters support mode. No new features are added – those only go into the next release. In rare cases we will make an exception; for example, we said that if any accessors or setters are missing in 1.1.0, because of structures being made opaque, we would treat that as a bug.

Support itself is divided into three phases. First, there is active and ongoing support. All bugs are appropriate for this phase. This happens once the release is published. Next is the security-only phase, where we only fix security bugs, which will typically have a CVE associated with them. This happens for the final year of support. Finally, there is EOL (end of life), where the project no longer provides any support or fixes.

In the typical case, a release is supported for at least two years, which means one year of fixes and one year of security-only fixes. Some releases, however, are designated as LTS releases. They are supported for at least five years. We will specify an LTS release at least every four years, which gives the community at least a year to migrate.

Our current LTS release is 1.0.2, and it will be supported until the end of 2019. During that last year it will only receive security fixes. Although we are extended 1.1.0 support, we explicitly decided not to do it again, for either release.

Our next LTS release will be 1.1.1 which is currently in beta. As long as the release is out before the end of 2018, there is more than a year to migrate. (We’re confident it will be out before then, of course.) We encourage everyone to start porting to the OpenSSL master branch.

The 1.1.0 release will be supported for one year after 1.1.1 is released. And again, during that final year we will only provide security fixes. Fortunately, 1.1.0 is ABI compatible with 1.1.1, so moving up should not be difficult. Our TLS 1.3 wiki page has some more details around the impact of TLS 1.3 support.

Finally, this has an impact on the OpenSSL FIPS module, #1747. That module is valid until the January 29, 2022. This means that for the final two-plus years of its validity, we will not be supporting the release on which the module is based. We have already stated that we do not support the module itself; this adds to the burden that vendors will have to take on. On the positive side, we’re committed to a new FIPS module, it will be based on the current codebase, and we think we can get it done fairly quickly.

May 18, 2018 06:00 AM

May 17, 2018

Cryptography Engineering

Was the Efail disclosure horribly screwed up?

TL;DR. No. Or keep reading if you want.

On Monday a team of researchers from Münster, RUB and NXP disclosed serious cryptographic vulnerabilities in a number of encrypted email clients. The flaws, which go by the cute vulnerability name of “Efail”, potentially allow an attacker to decrypt S/MIME or PGP-encrypted email with only minimal user interaction.

By the standards of cryptographic vulnerabilities, this is about as bad as things get. In short: if an attacker can intercept and alter an encrypted email — say, by sending you a new (altered) copy, or modifying a copy stored on your mail server — they can cause many GUI-based email clients to send the full plaintext of the email to an attacker controlled-server. Even worse, most of the basic problems that cause this flaw have been known for years, and yet remain in clients.

The big (and largely under-reported) story of EFail is the way it affects S/MIME. That “corporate” email protocol is simultaneously (1) hated by the general crypto community because it’s awful and has a slash in its name, and yet (2) is probably the most widely-used email encryption protocol in the corporate world. The table at the right — excerpted from the paper — gives you a flavor of how Efail affects S/MIME clients. TL;DR it affects them very badly.

Efail also happens to affect a smaller, but non-trivial number of OpenPGP-compatible clients. As one might expect (if one has spent time around PGP-loving folks) the disclosure of these vulnerabilities has created something of a backlash on HN, and among people who make and love OpenPGP clients. Mostly for reasons that aren’t very defensible.

So rather than write about fun things — like the creation of CFB and CBC gadgets — today, I’m going to write about something much less exciting: the problem of vulnerability disclosure in ecosystems like PGP. And how bad reactions to disclosure can hurt us all.

How Efail was disclosed to the PGP community

Putting together a comprehensive timeline of the Efail disclosure process would probably be a boring, time-intensive project. Fortunately Thomas Ptacek loves boring and time-intensive projects, and has already done this for us.

Briefly, the first Efail disclosures to vendors began last October, more than 200 days prior to the agreed publication date. The authors notified a large number of vulnerable PGP GUI clients, and also notified the GnuPG project (on which many of these projects depend) by February at the latest. From what I can tell every major vendor agreed to make some kind of patch. GnuPG decided that it wasn’t their fault, and basically stopped corresponding.

All parties agreed not to publicly discuss the vulnerability until an agreed date in April, which was later pushed back to May 15. The researchers also notified the EFF and some journalists under embargo, but none of them leaked anything. On May 14 someone dumped the bug onto a mailing list. So the EFF posted a notice about the vulnerability (which we’ll discuss a bit more below), and the researchers put up a website. That’s pretty much the whole story.

There are three basic accusations going around about the Efail disclosure. They can be summarized as (1) maintaining embargoes in coordinated disclosures is really hard, (2) the EFF disclosure “unfairly” made this sound like a serious vulnerability “when it isn’t”, and (3) everything was already patched anyway so what’s the big deal.

Disclosures are hard; particularly coordinated ones

I’ve been involved in two disclosures of flaws in open encryption protocols. (Both were TLS issues.) Each one poses an impossible dilemma. You need to simultaneously (a) make sure every vendor has as much advance notice as possible, so they can patch their software. But at the same time (b) you need to avoid telling literally anyone, because nothing on the Internet stays secret. At some point you’ll notify some FOSS project that uses an open development mailing list or ticket server, and the whole problem will leak out into the open.

Disclosing bugs that affect PGP is particularly fraught. That’s because there’s no such thing as “PGP”. What we have instead is a large and distributed community that revolves around the OpenPGP protocol. The pillar of this community is the GnuPG project, which maintains the core GnuPG tool and libraries that many clients rely on. Then there are a variety of niche GUI-based clients and email plugin projects. Finally, there are commercial vendors like Apple and Microsoft. (Who are mostly involved in the S/MIME side of things, and may reluctantly allow PGP plugins.)

Then, of course there are thousands of end-users, who will generally fail to update their software unless something really bad and newsworthy happens.

The obvious solution to the disclosure problem to use a staged disclosure. You notify the big commercial vendors first, since that’s where most of the affected users are. Then you work your way down the “long tail” of open source projects, knowing that inevitably the embargo could break and everyone will have to patch in a hurry. And you keep in mind that no matter what happens, everyone will blame you for screwing up the disclosure.

For the PGP issues in Efail, the big client vendors are Mozilla (Thunderbird), Microsoft (Outlook) and maybe Apple (Mail). The very next obvious choice would be to patch the GnuPG tool so that it no longer spits out unauthenticated plaintext, which is the root of many of the problems in Efail.

The Efail team appears to have pursued exactly this approach for the client-side vulnerabilities. Sadly, the GnuPG team made the decision that it’s not their job to pre-emptively address problems that they view as ‘clients misusing the GnuPG API’ (my paraphrase), even when that misuse appears to be rampant across many of the clients that use their tool. And so the most obvious fix for one part of the problem was not available.

This is probably the most unfortunate part of the Efail story, because in this case GnuPG is very much at fault. Their API does something that directly violates cryptographic best practices — namely, releasing unauthenticated plaintext prior to producing an error message. And while this could be understood as a reasonable API design at design time, continuing to support this API even as clients routinely misuse it has now led to flaws across the ecosystem. The refusal of GnuPG to take a leadership role in preemptively safeguarding these vulnerabilities both increases the difficulty of disclosing these flaws, and increases the probability of future issues.

So what went wrong with the Efail disclosure?

Despite what you may have heard, given the complexity of this disclosure, very little went wrong. The main issues people have raised seem to have to do with the contents of an EFF post. And with some really bad communications from Robert J. Hansen at the Enigmail (and GnuPG) project.

The EFF post. The Efail researchers chose to use the Electronic Frontier Foundation as their main source for announcing the existence of the vulnerability to the privacy community. This hardly seems unreasonable, because the EFF is generally considered a trusted broker, and speaks to the right community (at least here in the US).

The EFF post doesn’t give many details, nor does it give a list of affected (or patched) clients. It does give two pretty mild recommendations:

  1. Temporarily disable or uninstall your existing clients until you’ve checked that they’re patched.
  2. Maybe consider using a more modern cryptosystem like Signal, at least until you know that your PGP client is safe again.

This naturally led to a huge freakout by many in the PGP community. Some folks, including vendors, have misrepresented the EFF post as essentially pushing people to “permanently” uninstall PGP, which will “put lives at risk” because presumably these users (whose lives are at risk, remember) will immediately fall back to sending incriminating information via plaintext emails — rather than temporarily switching their communications to one of several modern, well-studied secure messengers, or just not emailing for a few hours.

In case you think I’m exaggerating about this, here’s one reaction from ProtonMail:

The most reasonable criticism I’ve heard of the EFF post is that it doesn’t give many details about which clients are patched, and which are vulnerable. This could presumably give someone the impression that this vulnerability is still present in their email client, and thus would cause them to feel less than secure in using it.

I have to be honest that to me that sounds like a really good outcome. The problem with Efail is that it doesn’t matter if your client is secure. The Efail vulnerability could affect you if even a single one of your communication partners is using an insecure client.

So needless to say I’m not very sympathetic to the reaction around the EFF post. If you can’t be sure whether your client is secure, you probably should feel insecure.

Bad communications from GnuPG and Enigmail. On the date of the disclosure, anyone looking for accurate information about security from two major projects — GnuPG and Enigmail — would not have been able to find it.

They wouldn’t have found it because developers from both Enigmail and GnuPG were on mailing lists and Twitter claiming that they had never heard of Efail, and hadn’t been notified by the researchers. Needless to say, these allegations took off around the Internet, sometimes in place of real information that could have helped users (like, whether either project had patched.)

It goes without saying that neither allegation was actually true. In fact, both project members soon checked with their fellow developers (and their memories) and found out that they’d both been given months of notice by the researchers, and that Enigmail had even developed a patch. (However, it turned out that even this patch may not perfectly address the issue, and the community is still working to figure out exactly what still needs to be done.)

This is an understandable mistake, perhaps. But it sure is a bad one.

PGP is bad technology and it’s making a bad community

Now that I’ve made it clear that neither the researchers nor the EFF is out to get the PGP community, let me put on my mask and horns and tell you why someone should be.

I’ve written extensively about PGP on this blog, but in the past I’ve written mostly from a technical point of view about the problems with PGP. But what’s really problematic about PGP is not just the cryptography; it’s the story it tells about path dependence and how software communities work.

The fact of the matter is that OpenPGP is not really a cryptography project. That is, it’s not held together by cryptography.  It’s held together by backwards-compatibility and (increasingly) a kind of an obsession with the idea of PGP as an end in and of itself, rather than as a means to actually make end-users more secure.

Let’s face it, as a protocol, PGP/OpenPGP is just not what we’d develop if we started over today. It was formed over the years out of mostly experimental parts, which were in turn replaced, bandaged and repaired — and then worked into numerous implementations, which all had to be insanely flexible and yet compatible with one another. The result is bad, and most of the software implementing it is worse. It’s the equivalent of a beloved antique sports car, where the electrical system is totally shot, but it still drives. You know, the kind of car where the owner has to install a hand-switch so he can turn the reverse lights on manually whenever he wants to pull out of a parking space.

If PGP went away, I estimate it would take the security community less than a year to entirely replace (the key bits of) the standard with something much better and modern. It would have modern crypto and authentication, and maybe even extensions for future post-quantum future security. It would be simple. Many bright new people would get involved to help write the inevitable Rust, Go and Javascript clients and libraries.

Unfortunately for us all, (Open)PGP does exist. And that means that even fancy greenfield email projects feel like they need to support OpenPGP, or at least some subset of it. This in turn perpetuates the PGP myth, and causes other clients to use it. And as a direct result, even if some clients re-implement OpenPGP from scratch, other clients will end up using tools like GnuPG which will support unauthenticated encryption with bad APIs. And the cycle will go round and around, like a spaceship stuck near the event horizon of a black hole.

And as the standard perpetuates itself, largely for the sake of being a standard, it will fail to attract new security people. It will turn away exactly the type of people who should be working on these tools. Those people will go off and build encryption systems in a totally different area, or they’ll get into cryptocurrency. And — with some exceptions — the people who work in the community will increasingly work in that community because they’re supporting PGP, and not because they’re trying to seek out the best security technologies for their users. And the serious (email) users of PGP will be using it because they like the idea of using PGP better than they like using an actual, secure email standard.

And as things get worse, and fail to develop, people who work on it will become more dogmatic about its importance, because it’s something threatened and not a real security protocol that anyone’s using. To me that’s where PGP is going today, and that is why the community has such a hard time motivating itself to take these vulnerabilities seriously, and instead reacts defensively.

Maybe that’s a random, depressing way to end a post. But that’s the story I see in OpenPGP. And it makes me really sad.

by Matthew Green at May 17, 2018 07:06 PM

May 16, 2018

OpenSSL

Changing the Guiding Principles in Our Security Policy

“That we remove “We strongly believe that the right to advance patches/info should not be based in any way on paid membership to some forum. You can not pay us to get security patches in advance.” from the security policy and Mark posts a blog entry to explain the change including that we have no current such service.”

At the OpenSSL Management Committee meeting earlier this month we passed the vote above to remove a section our security policy. Part of that vote was that I would write this blog post to explain why we made this change.

At each face to face meeting we aim to ensure that our policies still match the view of the current membership committee at that time, and will vote to change those that don’t.

Prior to 2018 our Security Policy used to contain a lot of background information on why we selected the policy we did, justifying it and adding lots of explanatory detail. We included details of things we’d tried before and things that worked and didn’t work to arrive at our conclusion. At our face to face meeting in London at the end of 2017 we decided to remove a lot of the background information and stick to explaining the policy simply and concisely. I split out what were the guiding principles from the policy into their own list.

OpenSSL has some full-time fellows who are paid from various revenue sources coming into OpenSSL including sponsorship and support contracts. We’ve discussed having the option in the future to allow us to share patches for security issues in advance to these support contract customers. We already share serious issues a little in advance with some OS vendors (and this is still a principle in the policy to do so), and this policy has helped ensure that the patches and advisory get an extra level of testing before being released.

Thankfully there are relatively few serious issues in OpenSSL these days; the last worse than Moderate severity being in February 2017.

In the vote text we wrote that we have “no current such service” and neither do we have any plan right now to create such a service. But we allow ourselves to consider such a possibility in the future now that this principle, which no longer represents the view of the OMC, is removed.

May 16, 2018 09:00 PM

May 15, 2018

TaoSecurity

Bejtlich Joining Splunk


Since posting Bejtlich Moves On I've been rebalancing work, family, and personal life. I invested in my martial arts interests, helped more with home duties, and consulted through TaoSecurity.

Today I'm pleased to announce that, effective Monday May 21st 2018, I'm joining the Splunk team. I will be Senior Director for Security and Intelligence Operations, reporting to our CISO, Joel Fulton. I will help build teams to perform detection and monitoring operations, digital forensics and incident response, and threat intelligence. I remain in the northern Virginia area and will align with the Splunk presence in Tyson's Corner.

I'm very excited by this opportunity for four reasons. First, the areas for which I will be responsible are my favorite aspects of security. Long-time blog readers know I'm happiest detecting and responding to intruders! Second, I already know several people at the company, one of whom began this journey by Tweeting about opportunities at Splunk! These colleagues are top notch, and I was similarly impressed by the people I met during my interviews in San Francisco and San Jose.

Third, I respect Splunk as a company. I first used the products over ten years ago, and when I tried them again recently they worked spectacularly, as I expected. Fourth, my new role allows me to be a leader in the areas I know well, like enterprise defense and digital operational art, while building understanding in areas I want to learn, like cloud technologies, DevOps, and security outside enterprise constraints.

I'll have more to say about my role and team soon. Right now I can share that this job focuses on defending the Splunk enterprise and its customers. I do not expect to spend a lot of time in sales cycles. I will likely host visitors in the Tyson's areas from time to time. I do not plan to speak as much with the press as I did at Mandiant and FireEye. I'm pleased to return to operational defense, rather than advise on geopolitical strategy.

If this news interests you, please check our open job listings in information technology. As a company we continue to grow, and I'm thrilled to see what happens next!

by Richard Bejtlich (noreply@blogger.com) at May 15, 2018 06:40 PM

May 14, 2018

ma.ttias.be

Remote Desktop error: CredSSP encryption oracle remediation

The post Remote Desktop error: CredSSP encryption oracle remediation appeared first on ma.ttias.be.

A while back, Microsoft announced it would ship updates to both its RDP client & server components to resolve a critical security vulnerability. That rollout is now happening and many clients have received auto-updates for their client.

As a result, you might see this message/error when connecting to an unpatched Windows server:

It refers to CredSSP updates for CVE-2018-0886, which further explains the vulnerability and why it's been patched now.

But here's the catch: if your client is updated but your server isn't (yet), you can no longer RDP to that machine. Here's a couple of fixes;

  1. Find an old computer/RDP client to connect with
  2. Get console access to the server to run the updates & reboot the machine

If your client has been updated, there's no way to connect to an unpatched Windows server via Remote Desktop anymore.

The post Remote Desktop error: CredSSP encryption oracle remediation appeared first on ma.ttias.be.

by Mattias Geniar at May 14, 2018 08:21 AM

May 08, 2018

Sean's IT Blog

VMware Horizon and Horizon Cloud Enhancements – Part 1

This morning, VMware announced enhancements to both the on-premises Horizon Suite and Horizon Cloud product sets.  Although there are a lot of additions to all products in the Suite, the VMware blog post did not go too indepth into many of the new features that you’ll be seeing in the upcoming editions.

VMware Horizon 7.5

Let’s start with the biggest news in the blog post – the announcement of Horizon 7.5.  Horizon 7.5 brings several new, long-awaited, features with it.  Some of these features are:

  1. Support for Horizon on VMC (VMware on AWS)
  2. The “Just-in-Time” Management Platform (JMP)
  3. Horizon 7 Extended Service Branch (ESB)
  4. Instant Clone improvements, including support for the new vSphere 6.7 Instant Clone APIs
  5. Support for IPv4/IPv6 Mixed-Mode Operations
  6. Cloud-Pod Architecture support for 200K Sessions
  7. Support for Windows 10 Virtualization-Based Security (VBS) and vTPM on Full Clone Desktops
  8. RDSH Host-based GPO Support for managing protocol settings

I’m not going to touch on all of these items.  I think the first four are the most important for this portion of the suite.

Horizon on VMC

Horizon on VMC is a welcome addition to the Horizon portfolio.  Unlike Citrix, the traditional VMware Horizon product has not had a good cloud story because it has been tightly coupled to the VMware SDDC stack.  By enabling VMC support for Horizon, customers can now run virtual desktops in AWS, or utilize VMC as a disaster recovery option for Horizon environments.

Full clone desktops will be the only desktop type supported in the initial release of Horizon on VMC.  Instant Clones will be coming in a future release, but some additional development work will be required since Horizon will not have the same access to vCenter in VMC as it has in on-premises environments.  I’m also hearing that Linked Clones and Horizon Composer will not be supported in VMC.

The initial release of Horizon on VMC will only support core Horizon, the Unified Access Gateway, and VMware Identity Manager.  Other components of the Horizon Suite, such as UEM, vRealize Operations, and App Volumes have not been certified yet (although there should be nothing stopping UEM from working in Horizon on VMC because it doesn’t rely on any vSphere components).  Security Server, Persona Management, and ThinApp will not be supported.

Horizon Extended Service Branches

Under the current release cadence, VMware targets one Horizon 7 release per quarter.  The current support policy for Horizon states that a release only continues to receive bug fixes and security patches if a new point release hasn’t been available for at least 60 days.  Let’s break that down to make it a little easier to understand.

  1. VMware will support any version of Horizon 7.x for the lifecycle of the product.
  2. If you are currently running the latest Horizon point release (ex. Horizon 7.4), and you find a critical bug/security issue, VMware will issue a hot patch to fix it for that version.
  3. If you are running Horizon 7.4, and Horizon 7.5 has been out for less than 60 days when you find a critical bug/security issue, VMware will issue a hot patch to fix it for that version.
  4. If you are running Horizon 7.4, and Horizon 7.5 has been out for more than 60 days when you find a critical bug/security issue, the fix for the bug will be applied to Horizon 7.5 or later, and you will need to upgrade to receive the fix.

In larger environments, Horizon upgrades can be non-trivial efforts that enterprises may not undertake every quarter.  There are also some verticals, such as healthcare, where core business applications are certified against specific versions of a product, and upgrading or moving away from that certified version can impact support or support costs for key business applications.

With Horizon 7.5, VMware is introducing a long-term support bundle for the Horizon Suite.  This bundle will be called the Extended Service Branch (ESB), and it will contain Horizon 7, App Volumes, User Environment Manager, and Unified Access Gateway.  The ESB will have 2 years of active support from release date where it will receive hot fixes, and each ESB will receive three service packs with critical bug and security fixes and support for new Windows 10 releases.  A new ESB will be released approximately every twelve months.

Each ESB branch will support approximately 3-4 Windows 10 builds, including any recent LTSC builds.  That means the Horizon 7.5 ESB release will support the Windows 10 1709, 1803, 1809 and 1809 LTSC builds of Windows 10.

This packaging is nice for enterprise organizations that want to limit the number of Horizon upgrades they want to apply in a year or require long-term support for core business applications.  I see this being popular in healthcare environments.

Extended Service Branches do not require any additional licensing, and customers will have the option to adopt either the current release cadence or the extended service branch when implementing their environment.

JMP

The Just-in-Time Management Platform, or JMP, is a new component of the Horizon Suite.  The intention is to bring together Horizon, Active Directory, App Volumes, and User Environment Manager to provide a single portal for provisioning instant clone desktops, applications, and policies to users.  JMP also brings a new, HTML5 interface to Horizon.

I’m a bit torn on the concept.  I like the idea behind JMP and providing a portal for enabling user self-provisioning.  But I’m not sure building that portal into Horizon is the right place for it.  A lot of organizations use Active Directory Groups as their management layer for Horizon Desktop Pools and App Volumes.  There is a good reason for doing it this way.  It’s easy to audit who has desktop or application access, and there are a number of ways to easily generate reports on Active Directory Group membership.

Many customers that I talk to are also attempting to standardize their IT processes around an ITSM platform that includes a Service Catalog.  The most common one I run across is ServiceNow.  The customers that I’ve talked to that want to implement self-service provisioning of virtual desktops and applications often want to do it in the context of their service catalog and approval workflows.

It’s not clear right now if JMP will include an API that will allow customers to integrate it with an existing service catalog or service desk tool.  If it does include an API, then I see it being an important part of automated, self-service end-user computing solutions.  If it doesn’t, then it will likely be another yet-another-user-interface, and the development cycles would have been better spent on improving the Horizon and App Volumes APIs.

Not every customer will be utilizing a service catalog, ITSM tool and orchestration. For those customers, JMP could be an important way to streamline IT operations around virtual desktops and applications and provide them some benefits of automation.

Instant Clone Enhancements

The release of vSphere 6.7 brought with it new Instant Clone APIs.  The new APIs bring features to VMFork that seem new to pure vSphere Admins but have been available to Horizon for some time such as vMotion.  The new APIs are why Horizon 7.4 does not support vSphere 6.7 for Instant Clone desktops.

Horizon 7.5 will support the new vSphere 6.7 Instant Clone APIs.  It is also backward compatible with the existing vSphere 6.0 and 6.5 Instant Clone APIs.

There are some other enhancements coming to Instant Clones as well.  Instant Clones will now support vSGA and Soft3D.  These settings can be configured in the parent image.  And if you’re an NVIDIA vGPU customer, more than one vGPU profile will be supported per cluster when GPU Consolidation is turned on.  NVIDIA GRID can only run a single profile per discrete GPU, so this feature will be great for customers that have Maxwell-series boards, especially the Tesla M10 high-density board that has four discrete GPUs.  However, I’m not sure how beneficial it will be with customer that adopt Pascal-series or Volta-series Tesla cards as these only have a single discrete GPU per board.   There may be some additional design considerations that need to be worked out.

Finally, there is one new Instant Clone feature for VSAN customers.  Before I explain the feature, I can to explain how Horizon utilizes VMFork and Instant Clone technology.  Horizon doesn’t just utilize VMFork – it adds it’s own layers of management on top of it to overcome the limitations of the first generation technology.  This is how Horizon was able to support Instant Clone vMotion when the standard VMFork could not.

This additional layer of management also allows VMware to do other cool things with Horizon Instant Clones without having to make major changes to the underlying platform.  One of the new features that is coming in Horizon 7.5 for VSAN customers is the ability to use Instant Clones across cluster boundaries.

For those who aren’t familiar with VSAN, it is VMware’s software-defined storage product.  The storage boundary for VSAN aligns with the ESXi cluster, so I’m not able to stretch a VSAN datastore between vSphere clusters.  So if I’m running a large EUC environment using VSAN, I may need multiple clusters to meet the needs of my user base.  And unlike 3-tier storage, I can’t share VSAN datastores between clusters.  Under the current setup in Horizon 7.4, I would need to have a copy of my gold/master/parent image in each cluster.

Due to some changes made in Horizon 7.5, I can now share an Instant Clone gold/master/parent image across VSAN clusters without having to make a copy of it in each cluster first.  I don’t have too many specific details on how this will work, but it could significantly reduce the management burden of large, multi-cluster Horizon environments on VSAN.

Blast Extreme Enhancements

The addition of Blast Extreme Adaptive Transport, or BEAT as it’s commonly known, provided an enhanced session remoting experience when using Blast Extreme.  It also required users and administrators to configure which transport they wanted to use in the client, and this could lead to less than optimal user experience for users who frequently moved between locations with good and bad connectivity.

Horizon 7.5 adds some automation and intelligence to BEAT with a feature called Blast Extreme Network Intelligence.  NI will evaluate network conditions on the client side and automatically choose the correct Blast Extreme transport to use.  Users will no longer have to make that choice or make changes in the client.  As a result, the Excellent, Typical, and Poor options are being removed from future versions of the Horizon client.

Another major enhancment coming to Blast Extreme is USB Redirection Port Consolidation.  Currently, USB redirection utilizes a side channel that requires an additional port to be opened in any external-facing firewalls.  Starting in Horizon 7.5, customers will have the option to utilize USB redirection over ports 443/8443 instead of the side channel.

Performance Tracker

The last item I want to cover in this post is Performance Tracker.  Performance Tracker is a tool that Pat Lee demonstrated at VMworld last year, and it is a tool to present session performance metrics to end users.  It supports both Blast Extreme and PCoIP, and it provides information such as session latency, frames per second, Blast Extreme transport type, and help with troubleshooting connectivity issues between the Horizon Agent and the Horizon Client.

Part 2

As you can see, there is a lot of new stuff in Horizon 7.5.  We’ve hit 1900 words in this post just talking about what’s new in Horizon.  We haven’t touched on client improvements, Horizon Cloud, App Volumes, UEM or Workspace One Intelligence yet.  So we’ll have to break those announcements into another post that will be coming in the next day or two.

by seanpmassey at May 08, 2018 02:09 PM

Everything Sysadmin

SO (my employer) is hiring a Windows SRE/sysadmin in NY/NJ

Come work with Stack Overflow's SRE team!

We're looking for a Windows system administrator / SRE to join our SRE team at Stack Overflow. (The downside is that I'll be your manager... ha ha ha). Anyway... the full job description is here:

https://stackoverflow.com/company/work-here/1152509/

A quick and unofficial FAQ:

Q: NYC/NJ? I thought Stack was an "remote first" company! Whudup with that?

A: While most of the SRE team works remotely, we like to have a few team members near each of our datacenters (Jersey City, NJ and Denver, CO). You won't be spending hours each week pulling cables, I promise you. In fact, we use remote KVMs, and a "remote hands" service for most things. Heck, a lot of our new products are running in "the cloud" (and probably more over time). That said, it's good to have 1-2 people within easy travel distance of the datacenters for emergencies.

Q: Can I work from home?

A: Absolutely. You can work from home (we'll ship you a nice desk, chair and other great stuff) or you can work from our NYC office (see the job advert for a list of perks). Either way, you will need to be able to get to the Jersey City, NJ data center in a reasonable amount of time (like... an hour).

Q: Wait... Windows?

A: Yup. We're a mixed Windows and Linux environment. We're doing a lot of cutting edge stuff with Windows. We were early adopters of PowerShell (if you love PowerShell, definitely apply!) and DSC and a number of other technologies. Microsoft's containers is starting to look good too (hint, hint).

Q: You mentioned another datacenter in Denver, CO. What if I live near there?

A: This position is designated as "NY/NJ". However, watch this space. Or, if you are impatient, contact me and I'll loop you in.

Q: Where do I get more info? How do I apply?

A: https://stackoverflow.com/company/work-here/1152509/

by Tom Limoncelli at May 08, 2018 02:02 AM

May 07, 2018

TaoSecurity

Trying Splunk Cloud

I first used Splunk over ten years ago, but the first time I blogged about it was in 2008. I described how to install Splunk on Ubuntu 8.04. Today I decided to try the Splunk Cloud.

Splunk Cloud is the company's hosted Splunk offering, residing in Amazon Web Services (AWS). You can register for a 15 day free trial of Splunk Cloud that will index 5 GB per day.

If you would like to follow along, you will need a computer with a Web browser to interact with Splunk Cloud. (There may be ways to interact via API, but I do not cover that here.)

I will collect logs from a virtual machine running Debian 9, inside Oracle VirtualBox.

First I registered for the free Splunk Cloud trial online.

After I had a Splunk Cloud instance running, I consulted the documentation for Forward data to Splunk Cloud from Linux. I am running a "self-serviced" instance and not a "managed instance," i.e., I am the administrator in this situation.

I learned that I needed to install a software package called the Splunk Universal Forwarder on my Linux VM.

I downloaded a 64 bit Linux 2.6+ kernel .deb file to the /home/Downloads directory on the Linux VM.

richard@debian:~$ cd Downloads/

richard@debian:~/Downloads$ ls

splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-amd64.deb

With elevation permissions I created a directory for the .deb, changed into the directory, and installed the .deb using dpkg.

richard@debian:~/Downloads$ sudo bash
[sudo] password for richard: 

root@debian:/home/richard/Downloads# mkdir /opt/splunkforwarder

root@debian:/home/richard/Downloads# mv splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-amd64.deb /opt/splunkforwarder/

root@debian:/home/richard/Downloads# cd /opt/splunkforwarder/

root@debian:/opt/splunkforwarder# ls

splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-amd64.deb

root@debian:/opt/splunkforwarder# dpkg -i splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-amd64.deb 

Selecting previously unselected package splunkforwarder.
(Reading database ... 141030 files and directories currently installed.)
Preparing to unpack splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-amd64.deb ...
Unpacking splunkforwarder (7.1.0) ...
Setting up splunkforwarder (7.1.0) ...
complete

root@debian:/opt/splunkforwarder# ls
bin        license-eula.txt
copyright.txt  openssl
etc        README-splunk.txt
ftr        share
include        splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-amd64.deb
lib        splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-x86_64-manifest

Next I changed into the bin directory, ran the splunk binary, and accepted the EULA.

root@debian:/opt/splunkforwarder# cd bin/

root@debian:/opt/splunkforwarder/bin# ls

btool   copyright.txt   openssl slim   splunkmon
btprobe   genRootCA.sh   pid_check.sh splunk   srm
bzip2   genSignedServerCert.sh  scripts splunkd
classify  genWebCert.sh   setSplunkEnv splunkdj

root@debian:/opt/splunkforwarder/bin# ./splunk start

SPLUNK SOFTWARE LICENSE AGREEMENT

THIS SPLUNK SOFTWARE LICENSE AGREEMENT ("AGREEMENT") GOVERNS THE LICENSING,
INSTALLATION AND USE OF SPLUNK SOFTWARE. BY DOWNLOADING AND/OR INSTALLING SPLUNK
SOFTWARE: (A) YOU ARE INDICATING THAT YOU HAVE READ AND UNDERSTAND THIS

...

Splunk Software License Agreement 04.24.2018

Do you agree with this license? [y/n]: y

Now I had to set an administrator password for this Universal Forwarder instance. I will refer to it as "mypassword" in the examples that follow although Splunk does not echo it to the screen below.

This appears to be your first time running this version of Splunk.

An Admin password must be set before installation proceeds.
Password must contain at least:
   * 8 total printable ASCII character(s).
Please enter a new password: 
Please confirm new password: 

Splunk> Map. Reduce. Recycle.

Checking prerequisites...
Checking mgmt port [8089]: open
Creating: /opt/splunkforwarder/var/lib/splunk
Creating: /opt/splunkforwarder/var/run/splunk
Creating: /opt/splunkforwarder/var/run/splunk/appserver/i18n
Creating: /opt/splunkforwarder/var/run/splunk/appserver/modules/static/css
Creating: /opt/splunkforwarder/var/run/splunk/upload
Creating: /opt/splunkforwarder/var/spool/splunk
Creating: /opt/splunkforwarder/var/spool/dirmoncache
Creating: /opt/splunkforwarder/var/lib/splunk/authDb
Creating: /opt/splunkforwarder/var/lib/splunk/hashDb
New certs have been generated in '/opt/splunkforwarder/etc/auth'.
Checking conf files for problems...
Done
Checking default conf files for edits...
Validating installed files against hashes from '/opt/splunkforwarder/splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-x86_64-manifest'
All installed files intact.
Done
All preliminary checks passed.

Starting splunk server daemon (splunkd)...  
Done

With that done, I had to return to the Splunk Cloud Web site, and click the link to "Download Universal Forwarder Credentials" to download a splunkclouduf.spl file. As noted in the documentation, splunkclouduf.spl is a "credentials file, which contains a custom certificate for your Splunk Cloud deployment. The universal forwarder credentials are different from the credentials that you use to log into Splunk Cloud."

After downloading the splunkclouduf.spl file, I installed it. Note I pass "admin" as the user and "mypassword" as the password here. After installing I restart the universal forwarder.

root@debian:/opt/splunkforwarder/bin# ./splunk install app /home/richard/Downloads/splunkclouduf.spl -auth admin:mypassword

App '/home/richard/Downloads/splunkclouduf.spl' installed 

root@debian:/opt/splunkforwarder/bin# ./splunk restart
Stopping splunkd...
Shutting down.  Please wait, as this may take a few minutes.
.......
Stopping splunk helpers...

Done.

Splunk> Map. Reduce. Recycle.

Checking prerequisites...
Checking mgmt port [8089]: open
Checking conf files for problems...
Done
Checking default conf files for edits...
Validating installed files against hashes from '/opt/splunkforwarder/splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-x86_64-manifest'
All installed files intact.
Done
All preliminary checks passed.

Starting splunk server daemon (splunkd)...  
Done

It's time to take the final steps to get data into Splunk Cloud. I need to forwarder management in the Splunk Cloud Web site. Observe the input-prd-p-XXXX.cloud.splunk.com in the command. You obtain this (mine is masked with XXXX) from the URL for your Splunk Cloud deployment, e.g., https://prd-p-XXXX.cloud.splunk.com. Note that you have to add "input-" before the fully qualified domain name used by the Splunk Cloud instance.

root@debian:/opt/splunkforwarder/bin# ./splunk set deploy-poll input-prd-p-XXXX.cloud.splunk.com:8089

Your session is invalid.  Please login.
Splunk username: admin
Password: 
Configuration updated.

Once again I restart the universal forwarder. I'm not sure if I could have done all these restarts at the end.

root@debian:/opt/splunkforwarder/bin# ./splunk restart
Stopping splunkd...
Shutting down.  Please wait, as this may take a few minutes.
.......
Stopping splunk helpers...

Done.

Splunk> Map. Reduce. Recycle.

Checking prerequisites...
Checking mgmt port [8089]: open
Checking conf files for problems...
Done
Checking default conf files for edits...
Validating installed files against hashes from '/opt/splunkforwarder/splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-x86_64-manifest'
All installed files intact.
Done
All preliminary checks passed.

Starting splunk server daemon (splunkd)...  
Done

Finally I need to tell the universal forwarder to watch some logs on this Linux system. I tell it to monitor the /var/log directory and restart one more time.

root@debian:/opt/splunkforwarder/bin# ./splunk add monitor /var/log
Your session is invalid.  Please login.
Splunk username: admin
Password: 
Added monitor of '/var/log'.

root@debian:/opt/splunkforwarder/bin# ./splunk restart

Stopping splunkd...
Shutting down.  Please wait, as this may take a few minutes.
...............
Stopping splunk helpers...

Done.

Splunk> Map. Reduce. Recycle.

Checking prerequisites...
Checking mgmt port [8089]: open
Checking conf files for problems...
Done
Checking default conf files for edits...
Validating installed files against hashes from '/opt/splunkforwarder/splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-x86_64-manifest'
All installed files intact.
Done
All preliminary checks passed.

Starting splunk server daemon (splunkd)...  
Done

At this point I return to the Splunk Cloud Web interface and click the "search" feature. I see Splunk is indexing some data.


I run a search for "host=debian" and find my logs.


Not too bad! Have you tried Splunk Cloud? What do you think? Leave me a comment below.

Update: I installed the Universal Forwarder on FreeBSD 11.1 using the method above (except with a FreeBSD .tgz) and everything seems to be working!

by Richard Bejtlich (noreply@blogger.com) at May 07, 2018 06:26 PM

SysAdmin1138

Systemd dependencies

There is a lot of hate around Systemd in unixy circles. Like, a lot. There are many reasons for this, a short list:

  • For some reason they felt the need to reimplement daemons that have existed for years. And are finding the same kinds of bugs those older daemons found and squashed over a decade ago.
    • I'm looking at you Time-sync and DNS resolver.
  • It takes away an init system that everyone knows and is well documented in both the official documentation sense, and the unofficial 'millions of blog-posts' sense. Blog posts like this one.
  • It has so many incomprehensible edge-cases that make reasoning about the system even harder.
  • The maintainers are steely-eyed fundamentalists who know exactly how they want everything.
  • Because it runs so many things in parallel, bugs we've never had to worry about are now impossible to ignore.

So much hate. Having spent the last few weeks doing a sysv -> systemd migration, I've found another reason for that hate. And it's one I'm familiar with because I've spent so many years in the Puppet ecosystem.

People love to hate on puppet because of the wacky non-deterministic bugs. The order resources are declared in a module is not the order in which they are applied. Puppet uses a dependency model to determine the order of things, which leads to weird bugs where a thing has worked for two weeks but suddenly stops working that way because a new change was made somewhere that changed the order of resource-application. A large part of why people like Chef over Puppet is because Chef behaves like a scripting language, where the order of the file is the order things are done in.

Guess what? Systemd uses the Puppet model of dependency! This is why its hard to reason. And why I, someone who has been handling these kinds of problems for years, haven't spent much time shaking my tiny fist at an uncaring universe. There has been swearing, oh yes. But of a somewhat different sort.

The Puppet Model

Puppet has two kinds of dependency. Strict ordering, and do this if that other thing does something. Which makes for four ways of linking resources.

  • requires => Do this after this other thing.
  • before => Do this before this other thing.
  • subscribes => Do this after this other thing, but only if this other thing changes something.
  • notifies => Do this before this other thing, and tell it you changed something.

This makes for some real power, while also making the system hard to reason about.

Thing is, systemd goes a step further

The Systemd Model

Systemd also has dependencies, but it was also designed to run as much in parallel as possible. Puppet was written in Ruby, so has a strong single-threaded tendencies. Systemd is multi-threaded. Multi-threaded systems are harder to reason about in general. Add on dependency ordering to multi-threaded issues and you get a sheer cliff of learning before you can have a hope of following along. Even better (worse), systemd has more ways of defining relationships.

  • Before= This unit needs to get all the way done before the named units are even started. And, the named units only get started if this unit finishes successfully.
  • After= This unit only gets started if the named units run to completion first, successfully.
  • Requires= The named units will get started if this one is, and do so at the same time. Not only that, but if the named units are explicitly stopped, this one will be stopped as well. For puppet-heads, this breaks things since this works backwards.
  • BindsTo= Does everything Requires does, but will also stop this unit if the named unit stops for any reason, not just explicit stops.
  • Wants= Like Require, but less picky. The named units will get started, but not care if they can't start or end up failing.
  • Requisite= Like Require, but will fail immediately if the named services aren't started yet. Think of mount units not starting unless the device unit is already started.
  • Conflicts= A negative dependency. Turn this unit off if the named unit is started. And turn this other unit off if this unit is started.

There are several more I'm not going into. This is a lot, and some of these work independently. The documentation even says:

It is a common pattern to include a unit name in both the After= and Requires= options, in which case the unit listed will be started before the unit that is configured with these options.

Using both After and Requires means that the named units need to get all the way done (After=) before this unit is started. And if this unit is started, the named units need to get started as well (Require=).

Hence, in many cases it is best to combine BindsTo= with After=.

Using both configures a hard dependency relationship. After= means the other unit needs to be all the way started before this one is started. BindsTo= makes it so that this unit is only ever in an active state when the unit named in both BindsTo= and After= is in an active state. If that other unit fails or goes inactive, this one will as well.

There is also a concept missing from Puppet, and that's when the dependency fires. After/Before are trailing-edge triggers, they fire on completion, which is how Puppet works. Most of the rest are leading-edge triggered, where the dependency is satisfied as soon as the named units start. This is how you get parallelism in an init-system, and why the weirder dependencies are often combined with either Before or After.


Systemd hate will continue for the next 10 or so years, at least long enough for most Linux engineers to have been working with it to stop grumbling about how nice the olden days were.

It also means that fewer people will be writing startup services due to the complexity of doing anything other than 'start this after this other thing' ordering.

by SysAdmin1138 at May 07, 2018 06:20 PM

May 02, 2018

ma.ttias.be

DNS Spy now checks for the “Null MX”

The post DNS Spy now checks for the “Null MX” appeared first on ma.ttias.be.

A small but useful addition to the scoring system of DNS Spy: support for the Null MX record.

Internet mail determines the address of a receiving server through the DNS, first by looking for an MX record and then by looking for an A/AAAA record as a fallback.

Unfortunately, this means that the A/AAAA record is taken to be mail server address even when that address does not accept mail.

The No Service MX RR, informally called "null MX", formalizes the existing mechanism by which a domain announces that it accepts no mail, without having to provide a mail server; this permits significant operational efficiencies.

Source: RFC 7505 -- A "Null MX" No Service Resource Record for Domains That Accept No Mail

Give it a try at the DNS Spy Scan page.

The post DNS Spy now checks for the “Null MX” appeared first on ma.ttias.be.

by Mattias Geniar at May 02, 2018 05:45 PM

May 01, 2018

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – April 2018

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts based on last
month’s visitor data  (excluding other monthly or annual round-ups):
  1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our research on developing security monitoring use cases here – and we just UPDATED IT FOR 2018.
  2. Simple Log Review Checklist Released!” is often at the top of this list – this rapidly aging checklist is still a useful tool for many people. “On Free Log Management Tools” (also aged quite a bit by now) is a companion to the checklist (updated version)
  3. Updated With Community Feedback SANS Top 7 Essential Log Reports DRAFT2” is about top log reports project of 2008-2013, I think these are still very useful in response to “what reports will give me the best insight from my logs?”
  4. Again, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book  – note that this series is even mentioned in some PCI Council materials. 
  5. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009 (oh, wow, ancient history!). Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has more than 7X of the traffic of this blog]: 

Critical reference posts:
Current research on testing security:
Current research on threat detection “starter kit”
Just finished research on SOAR:
Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017.

Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Other posts in this endless series:

by Anton Chuvakin (anton@chuvakin.org) at May 01, 2018 06:11 PM

Electricmonk.nl

A short security review of Bitwarden

Bitwarden is an open source online password manager:

The easiest and safest way for individuals, teams, and business organizations to store, share, and sync sensitive data.

Bitwarden offers both a cloud hosted and on-premise version. Some notes on the scope of this blog post and disclaimers:

  • I only looked at the cloud hosted version.
  • This security review is not exhaustive, I only took about a few minutes to review various things.
  • I'm not a security researcher, just a paranoid enthusiast. If you find anything wrong with this blog post, please contact me at ferry DOT boender (AT) gmaildotcom.

Here are my findings:

Encryption password sent over the wire

There appears to be no distinction between the authentication password and encryption password.

When logging in, the following HTTP POST is made to Bitwarden's server:

client_id: web
grant_type: password
password: xFSJdHvKcrYQA0KAgOlhxBB3Bpsuanc7bZIKTpskiWk=
scope: api offline_access
username: some.person@gmail.com

That's a base64 encoded password. (Don't worry, I anonymized all secrets in this post, besides, it's all throw-away passwords anyway). Lets see what it contains:

>>> import base64
>>> base64.b64decode('xFSJdHvKcrYQA0KAgOlhxBB3Bpsuanc7bZIKTpskiWk=')
b'p\x54\xde\x35\xb6\x90\x992\x63bKn\x7f\xfbb\xb2\x94t\x1b\xe9f\xez\xeaz}e\x142X#\xbd\x1c'

Okay, at least that's not my plain text password. It is encoded, hashed or encrypted somehow, but I'm not sure how. Still, it makes me nervous that my password is being sent over the wire. The master password used for encryption should never leave a device, in any form. I would have expected two password here perhaps. One for authentication and one for encryption.

The reason it was implemented this way is probably because of the "Organizations" feature, which lets you share passwords with other people. Sharing secrets among people is probably hard to do in a secure way. I'm no cryptography expert, but there are probably ways to do this more securely using asymmetric encryption (public and private keys), which Bitwarden doesn't appear to be using.

Bitwarden has a FAQ entry about its use of encryption, which claims that passwords are never sent over the wire unencrypted or unhashed:

Bitwarden always encrypts and/or hashes your data on your local device before it is ever sent to the cloud servers for syncing. The Bitwarden servers are only used for storing encrypted data. It is not possible to get your unencrypted data from the Bitwarden cloud servers.

The FAQ entry on hashing is also relevant:

Bitwarden salts and hashes your master password with your email address on the client (your computer/device) before it is transmitted to our servers. Once the server receives the hashed password from your computer/device it is then salted again with a cryptographically secure random value, hashed again and stored in our database. This process is repeated and hashes are compared every time you log in.

The hashing functions that are used are one way hashes. This means that they cannot be reverse engineered by anyone at Bitwarden to reveal your true master password. In the hypothetical event that the Bitwarden servers were hacked and your data was leaked, the data would have no value to the hacker.

However, there's a major caveat here which they don't mention. All of the encryption is done client-side by Javascript loaded from various servers and CDNs. This means that an attacker who gains control over any of these servers (or man-in-the-middle's them somehow) can inject any javascript they like, and obtain your password that way.

Indiscriminate allowance / loading of external resources

The good news is that Bitwarden uses Content-Security-Policy. The bad news is that it allows the loading of resources from a variety of untrusted sources. uMatrix shows the type of resources it's trying to load from various sources:

Here's what the Content-Security-Policy looks like:

content-security-policy:
   default-src
      'self';
   script-src
      'self'
      'sha256-ryoU+5+IUZTuUyTElqkrQGBJXr1brEv6r2CA62WUw8w='
      https://www.google-analytics.com
      https://js.stripe.com
      https://js.braintreegateway.com
      https://www.paypalobjects.com
      https://maxcdn.bootstrapcdn.com
      https://ajax.googleapis.com;
   style-src
      'self'
      'unsafe-inline'
      https://maxcdn.bootstrapcdn.com
      https://assets.braintreegateway.com
      https://*.paypal.com
      https://fonts.googleapis.com;
   img-src
      'self'
      data:
      https://icons.bitwarden.com
      https://*.paypal.com
      https://www.paypalobjects.com
      https://q.stripe.com
      https://haveibeenpwned.com
      https://chart.googleapis.com
      https://www.google-analytics.com;
   font-src
      'self'
      https://maxcdn.bootstrapcdn.com
      https://fonts.gstatic.com;
   child-src
      'self'
      https://js.stripe.com
      https://assets.braintreegateway.com
      https://*.paypal.com
      https://*.duosecurity.com;
   frame-src
      'self'
      https://js.stripe.com
      https://assets.braintreegateway.com
      https://*.paypal.com
      https://*.duosecurity.com;

Roughly translated, it allows indiscriminate loading and executing of scripts, css, web workers (background threads) and inclusion of framed content from a wide variety of untrusted sources such as CDNs, Paypal, Duosecurity, Braintreegateway, Google, etc. Some of these I know, some I don't. Trust I have in none of them.

It would take too long to explain why this is a bad idea, but the gist of it is that the more resources you load and allow from different sources, the bigger the attack surface becomes. Perhaps these are perfectly secure (right now…), but an import part of security is the developers' security mindset. Some of these resources could have easily been hosted on the same origin servers. Some of these resources should only be allowed to run from payment pages. It shows sloppy configuration of the Content-Security-Policy, namely site-wide configuration in the web server (probably) rather than being determined on an URL by URL basis.

The actual client-side encryption library is loaded from vault.bitwarden.com, which is good. However, the (possibility of) inclusion of scripts from other sources negates any security benefits of doing so.

The inclusion of Google analytics in a password manager is, in my opinion, inexcusable. It's not required functionality for the application, so it shouldn't be in there.

New password entry is sent securely

When adding a new authentication entry, the entry appears to be client-side encrypted in some way before sending it to the server:

{
    "name": "2.eD4fFLYUWmM6sgVDSA9pTg==|SNzQjLitpA5K+6qrBwC7jw==|DlfVCnVdZA9+3oLej4FHSQwwdo/CbmHkL2TuwnfXAoI=", 
    "organizationId": null, 
    "fields": null, 
    "notes": null, 
    "favorite": false, 
    "login": {
        "username": null, 
        "password": "2.o4IO/yzz6syip4UEaU4QpA==|LbCyLjAOHa3m2wopsqayYK9O7Q5aqnR8nltUgylwSOo=|6ajVAh0r9OaBs+NgLKrTd+j3LdBLKBUbs/q8SE6XvUE=", 
        "totp": null
    }, 
    "folderId": null, 
    "type": 1
}

It's base64 again, and decodes into the same obscure binary string as the password when logging in. I have not spent time looking at how exactly the encoding / encryption is happening, so I cannot claim that this is actually secure. So keep that in mind. It does give credence to Bitwarden's claims that all sensitive data is encrypted client-side before sending it to the server.

Disclosure of my email address to a third part without my consent

I clicked on the "Data breach report" link on the left, and Bitwarden immediately sent my email address to https://haveibeenpwned.com. No confirmation, no nothing; it was disclosed to a third party immediately. Well, actually, since I use uMatrix to firewall my browser, it wasn't and I had to explicitly allow it to do so, but even most security nerds don't use uMatrix. 

That's not cool. Don't disclose my info to third parties without my consent.

Developer mindset

One of, if not the, most important aspects is the developer mindset. That is, do they care about security and are they knowledgeable in the field?

Bitwarden appears to know what they're doing. They have a security policy and run a bug bounty program. Security incidents appear to be solved quickly. I'd like to see more documentation on how the encryption, transfer and storage of secrets works. Right now, there are some FAQ entries, but it's all promisses that give me no insight into where and how the applied security might break down.

One thing that bothers me is that they do not disclose any of the security trade-offs they made and how it impacts the security of your secrets. I'm always weary when claims of perfect security are made, whether explicitely, or by omission of information. There are obvious problems with client-side javascript encryption, which every developer and user with an reasonable understanding of web developers recognises. No mention of this is made. Instead, security concerns are waved away with "everything is encrypted on your device!". That's nice, but if attackers can control the code that does the encryption, all is lost.

Please note that I'm not saying that client-side javascript encryption is a bad decision! It's a perfectly reasonable trade-off between the convenience of being able to access your secrets on all your devices and a more secure way of managing your passwords. However, this trade-off should be disclosed prominently to users.

Conclusion

So, is Bitwarden (Cloud) secure and should you use it? Unfortunately, I can't give you any advice. It all depends on your requirements. All security is a tradeoff between usability, convenience and security.

I did this review because my organisation is looking into a self-hosted Open Source password manager to manage our organisation's secrets. Would I use this to keep my personal passwords in? The answer is: no. I use an offline Keepass, which I manually sync from my laptop to my phone every now and then. This is still the most secure way of managing passwords that I do not need to share with anyone. However, that's not the use-case that I reviewed Bitwarden for. So would I use it to manage our organisation's secrets? Perhaps, the jury is still out on that. I'll need to look at the self-hosted version to see if it also includes Javascript from unreliable sources. If so, I'd have to say that, no, I would not recommend Bitwarden.

by admin at May 01, 2018 07:06 AM

April 30, 2018

ma.ttias.be

Certificate Transparency logging now mandatory

The post Certificate Transparency logging now mandatory appeared first on ma.ttias.be.

All certificates are now required to be logged in publicly available logs (aka "Certificate Transparency").

Since January 2015, Chrome has required that Extended Validation (EV) certificates be CT-compliant in order to receive EV status.

In April 2018, this requirement will be extended to all newly-issued publicly-trusted certificates -- DV, OV, and EV -- and certificates failing to comply with this policy will not be recognized as trusted when evaluated by Chrome.

Source: Certificate Transparency Enforcement in Google Chrome -- Google Groups

In other words: if Chrome encounters a certificate, issued after April 2018, that isn't signed by a Certificate Transparency log, the certificate will be marked as insecure.

Don't want to have this happen to you out of the blue? Monitor your sites and their certificate health via Oh Dear!.

The post Certificate Transparency logging now mandatory appeared first on ma.ttias.be.

by Mattias Geniar at April 30, 2018 01:42 PM

April 28, 2018

Homo-Adminus

Edge Web Server Testing at Swiftype

This article has been originally posted on Swiftype Engineering blog.


For any modern technology company, a comprehensive application test suite is an absolute necessity. Automated testing suites allow developers to move faster while avoiding any loss of code quality or system stability. Software development has seen great benefit come from the adoption of automated testing frameworks and methodologies, however, the culture of automated testing has neglected one key area of modern web application serving stack: web application edge routing and multiplexing rulesets.

From modern load balancer appliances that allow for TCL based rule sets; local or remotely hosted varnish VCL rules; or in the power and flexibility that Nginx and OpenResty make available through LUA, edge routing rulesets have become a vital part of application serving controls.

Over the past decade or so, it has become possible to incorporate more and more logic into edge web server infrastructures. Almost every modern web server has support for scripting, enabling developers to make their edge servers smarter than ever before. Unfortunately, the application logic configured within web servers is often much harder to test than that hosted directly in application code, and thus too often software teams resort to manual testing, or worse, customers as testers, by shipping their changes to production without edge routing testing having been performed.

In this post, I would like to explain the approach Swiftype has taken to ensure that our test suites account for our use of complex edge web server logic
to manage our production traffic flow, and thus that we can confidently deploy changes to our application infrastructure with little or no risk.

Our Web Infrastructure

Before I go into details of our edge web server configuration testing, it may be helpful to share an overview of the infrastructure behind our web services and applications.

Swiftype has evolved from a relatively simple Rails monolith and is still largely powered by a set of Ruby applications served by Unicorn application servers. To balance traffic between the multitude of application instances, we use Haproxy (mainly for its observability features and the fair load balancing implementation). Finally, there is an OpenResty (nginx+lua) layer at the edge of our infrastructure that is responsible for many key functions: SSL termination and enforcement, rate limiting, as well as providing flexible traffic management and routing functionality (written in Lua) customized specifically for the Swiftype API.

Here is a simple diagram of our web application infrastructure:

Swiftype web infrastructure overview

Testing Edge Web Servers

Swiftype’s edge web server configuration contains thousands of lines of code: from Nginx configs to custom templates rendered during deployment, to complex Lua logic used to manage production API traffic.Any mistake in this configuration, if not caught in testing, could lead to an outage at our edge, and considering that 100% of our API traffic is served through this layer, any outage at the edge is likely to be very impactful to our customers and our business. This is why we have invested time and resources to build a system that allows us to test our edge configuration changes in development and on CI before they are deployed to production systems.

Testing Workflow Overview

The first step in safely introducing change is ensuring that development and testing environments are quarantined from production environments. To do this we have created an “isolated” runtime mode for our edge web server stack. All changes to our edge configurations are first developed and run in this “isolated” mode. The “isolated” mode has no references to production backend infrastructure, and thus by employing the “isolated” mode, developers are able to iterate very quickly in a local environment without fear of harmful repercussions. All tests are written to run as part of the “isolated” mode employ a mock server to emulate production backends and primarily focus on the unit-testing of specific new features that are being implemented.

When we are confident enough in our unit-tested set of changes, we could run the same set of tests in an “acceptance testing” mode when the mock server used in isolated tests is replaced with an Haproxy load balancer with access to production networks. Working on tests and running them in this mode allows us to ensure with the highest degree of certainty that our changes will work in a real production environment since we exercise our whole stack while running the test suite.

Testing Environment Overview

Our testing environment employs Docker containers to serve in place of our production web servers. The test environment is comprised of the following components:

  • A loopback network interface on which a full complement of production IPs are configured to account for every service we are planning to test (e.g. a service foo.swiftype.com pointing to an IP address 10.1.0.x in production is tested in a local “isolated” testing environment with IP 10.1.0.x assigned to an alias on the local loopback interface). This allows us to perform end-to-end testing: DNS resolution, TCP service connections to a specific IP address, etc. without needing access to production, nor local /etc/hosts or name resolution changes.
  • For use cases where we are testing changes that are not represented in DNS (for example, when preparing edge servers for serving traffic currently handled by a different service), we may still employ local /etc/hosts entries to point the DNS name for a service to a local IP address for the period of testing. In this scenario, we ensure that our tests have been written in a way that is independent of the DNS configuration, and thus that the tests can be reused at a later date, or when the configuration has been deployed to production.
  • An OpenResty server instance with the configuration we need to test.
  • A test runner process (based on RSpec and a custom framework for writing our tests).
  • An optional Mock server. (As noted above, this might be docker in a local test environment, or in CI, and is likely to be used as part of the test runner process, where it emulates an external application/service; serves in place of a production backends; or acts as a local Haproxy instance running a production configuration and may even route traffic to real production backends.

Isolated Testing Walkthrough

Here is how a test for a hypothetical service foo.swiftype.com (registered in DNS as 1.2.3.4) is performed in an isolated environment:

  1. We automatically assign 1.2.3.4 as an alias on a loopback interface.
  2. We start a mock server listening on the localhost configured to respond on the same port used by the foo.swiftype.com Nginx server backend (in production, there would be haproxy on that port) with a specific stub response.
  3. Our test performs a DNS resolution for foo.swiftype.com, receives 10.1.0.x as the IP of the service, connects to the local Nginx instance listening on 10.1.0.x (bound to a loopback interface) and performs a test call.
  4. Nginx, receiving the test request, performs all configured operations and forwards the request to a backend, which in this case is handled by the local mock server. The call result is then returned by Nginx to the test runner.
  5. The test runner performs all defined testing against the server response: These tests can be very thorough, as the test runner has access to the server response code, all headers, and also the response body, and can thus confirm that all data returned meets each test’s specifications before concluding if the process as a whole has passed or failed test validation.
  6. Specific to isolated testing: In some use cases, we may validate the state of the Mock server, verifying that it has received all call we expected it to receive and that each call represented the data and headers expected. This can be very useful for testing changes where our web layer has been configured to alter requests (rewrite, add or remove headers, etc.) prior to passing them to a given backend.

Here is a diagram illustrating a test running in an isolated environment:

An isolated testing environment

Acceptance Testing Walkthrough

When all of our tests have passed in our “isolated” environment, and we want to make sure our configurations work in a non-mock, physically “production-like” environment (or during our periodic acceptance test runs that must also run in a production mirroring environment), we use an “acceptance testing” mode. In this mode, we replace our mock server with a real production Haproxy load balancer instance talking to real production backends (or a subset of backends representing a real production application).

Here is what happens during an acceptance test for the same hypothetical service foo.swiftype.com (registered in DNS as 1.2.3.4):

  1. We automatically assign 1.2.3.4 as an alias on a loopback interface.
  2. We start a dedicated production Haproxy instance, with a configuration pointing to production backend applications, and bind this dedicated haproxy instance to localhost. (This exactly mirrors what we do in production, where haproxy is always a dedicated localhost service).
  3. Our test performs DNS resolution for foo.swiftype.com, receives 10.1.0.x as the IP of the service, connects to a local Nginx instance listening on 10.1.0.x (bound to a loopback interface), and performs a test call.
  4. Nginx, receiving a test request, performs whatever operations are defined and forwards it to a local Haproxy backend, which in turn sends the request to a production application instance. When a call is complete, the result is returned by Nginx to the test runner.
  5. The test runner performs all defined checks on the response and defines whether the call and response are identified as passing or failing the test.

Here is a diagram illustrating a test call made in an acceptance testing environment:

A test call within the acceptance testing environment

Conclusion

Using our edge web server testing framework for the past few years, we have been able to perform hundreds of high-risk changes in our production edge infrastructure without any significant incidents being caused by the deploying of an untested configuration update. Our testing framework provides us the assurances we need, such that we can make very dramatic changes to our web application edge routing (services that affect every production request) and that we can be confident in our ability to introduce these changes safely.

We highly recommend that every engineering team tasked with building or operating complex edge server configurations adopt some level of testing that allows the team to iterate faster without fear of compromising these critical components.

by Oleksiy Kovyrin at April 28, 2018 08:38 PM

April 26, 2018

Cryptography Engineering

A few thoughts on Ray Ozzie’s “Clear” Proposal

Yesterday I happened upon a Wired piece by Steven Levy that covers Ray Ozzie’s proposal for “CLEAR”. I’m quoted at the end of the piece (saying nothing much), so I knew the piece was coming. But since many of the things I said to Levy were fairly skeptical — and most didn’t make it into the piece — I figured it might be worthwhile to say a few of them here.

Ozzie’s proposal is effectively a key escrow system for encrypted phones. It’s receiving attention now due to the fact that Ozzie has a stellar reputation in the industry, and due to the fact that it’s been lauded by law enforcement (and some famous people like Bill Gates). Ozzie’s idea is the just the latest bit of news in this second edition of the “Crypto Wars”, in which the FBI and various law enforcement agencies have been arguing for access to end-to-end encryption technologies — like phone storage and messaging — in the face of pretty strenuous opposition by (most of) the tech community.

In this post I’m going to sketch a few thoughts about Ozzie’s proposal, and about the debate in general. Since this is a cryptography blog, I’m mainly going to stick to the technical, and avoid the policy details (which are substantial). Also, since the full details of Ozzie’s proposal aren’t yet public — some are explained in the Levy piece and some in this patent — please forgive me if I get a few details wrong. I’ll gladly correct.

[Note: I’ve updated this post in several places in response to some feedback from Ray Ozzie. For the updated parts, look for the *. Also, Ozzie has posted some slides about his proposal.]

How to Encrypt a Phone

The Ozzie proposal doesn’t try tackle every form of encrypted data. Instead it focuses like a laser on the simple issue of encrypted phone storage. This is something that law enforcement has been extremely concerned about. It also represents the (relatively) low-hanging fruit of the crypto debate, for essentially two reasons: (1) there are only a few phone hardware manufacturers, and (2) access to an encrypted phone generally only takes place after law enforcement has gained physical access to it.

I’ve written about the details of encrypted phone storage in a couple of previous posts. A quick recap: most phone operating systems encrypt a large fraction of the data stored on your device. They do this using an encryption key that is (typically) derived from the user’s passcode. Many recent phones also strengthen this key by “tangling” it with secrets that are stored within the phone itself — typically with the assistance of a secure processor included in the phone. This further strengthens the device against simple password guessing attacks.

The upshot is that the FBI and local law enforcement have not — until very recently (more on that further below) — been able to obtain access to many of the phones they’ve obtained during investigation. This is due the fact that, by making the encryption key a function of the user’s passcode, manufacturers like Apple have effectively rendered themselves unable to assist law enforcement.

The Ozzie Escrow Proposal

Ozzie’s proposal is called “Clear”, and it’s fairly straightforward. Effectively, it calls for manufacturers (e.g., Apple) to deliberately put themselves back in the loop. To do this, Ozzie proposes a simple form of key escrow (or “passcode escrow”). I’m going to use Apple as our example in this discussion, but obviously the proposal will apply to other manufacturers as well.

Ozzie’s proposal works like this:

  1. Prior to manufacturing a phone, Apple will generate a public and secret “keypair” for some public key encryption scheme. They’ll install the public key into the phone, and keep the secret key in a “vault” where hopefully it will never be needed.
  2. When a user sets a new passcode onto their phone, the phone will encrypt a passcode under the Apple-provided public key. This won’t necessarily be the user’s passcode, but it will be an equivalent passcode that can unlock the phone.* It will store the encrypted result in the phone’s storage.
  3. In the unlikely event that the FBI (or police) obtain the phone and need to access its files, they’ll place the phone into some form of law enforcement recovery mode. Ozzie describes doing this with some special gesture, or “twist”. Alternatively, Ozzie says that Apple itself could do something more complicated, such as performing an interactive challenge/response with the phone in order to verify that it’s in the FBI’s possession.
  4. The phone will now hand the encrypted passcode to law enforcement. (In his patent, Ozzie suggests it might be displayed as a barcode on a screen.)
  5. The law enforcement agency will send this data to Apple, who will do a bunch of checks (to make sure this is a real phone and isn’t in the hands of criminals). Apple will access their secret key vault, and decrypt the passcode. They can then send this back to the FBI.
  6. Once the FBI enters this code, the phone will be “bricked”. Let me be more specific: Ozzie proposes that once activated, a secure chip inside the phone will now permanently “blow” several JTAG fuses monitored by the OS, placing the phone into a locked mode. By reading the value of those fuses as having been blown, the OS will never again overwrite its own storage, will never again talk to any network, and will become effectively unable to operate as a normal phone again.

When put into its essential form, this all seems pretty simple. That’s because it is. In fact, with the exception of the fancy “phone bricking” stuff in step (6), Ozzie’s proposal is a straightforward example of key escrow — a proposal that people have been making in various guises for many years. The devil is always in the details.

A vault of secrets

If we picture how the Ozzie proposal will change things for phone manufacturers, the most obvious new element is the key vault. This is not a metaphor. It literally refers to a giant, ultra-secure vault that will have to be maintained individually by different phone manufacturers. The security of this vault is no laughing matter, because it will ultimately store the master encryption key(s) for every single device that manufacturer ever makes. For Apple alone, that’s about a billion active devices.

Does this vault sound like it might become a target for organized criminals and well-funded foreign intelligence agencies? If it sounds that way to you, then you’ve hit on one of the most challenging problems with deploying key escrow systems at this scale. Centralized key repositories — that can decrypt every phone in the world — are basically a magnet for the sort of attackers you absolutely don’t want to be forced to defend yourself against.

So let’s be clear. Ozzie’s proposal relies fundamentally on the ability of manufacturers to secure extremely valuable key material for a massive number of devices against the strongest and most resourceful attackers on the planet. And not just rich companies like Apple. We’re also talking about the companies that make inexpensive phones and have a thinner profit margin. We’re also talking about many foreign-owned companies like ZTE and Samsung. This is key material that will be subject to near-constant access by the manufacturer’s employees, who will have to access these keys regularly in order to satisfy what may be thousands of law enforcement access requests every month.

If ever a single attacker gains access to that vault and is able to extract, a few “master” secret keys (Ozzie says that these master keys will be relatively small in size*) then the attackers will gain unencrypted access to every device in the world. Even better: if the attackers can do this surreptitiously, you’ll never know they did it.

Now in fairness, this element of Ozzie’s proposal isn’t really new. In fact, this key storage issue an inherent aspect of all massive-scale key escrow proposals. In the general case, the people who argue in favor of such proposals typically make two arguments:

  1. We already store lots of secret keys — for example, software signing keys — and things works out fine. So this isn’t really a new thing.
  2. Hardware Security Modules.

Let’s take these one at a time.

It is certainly true that software manufacturers do store secret keys, with varying degrees of success. For example, many software manufacturers (including Apple) store secret keys that they use to sign software updates. These keys are generally locked up in various ways, and are accessed periodically in order to sign new software. In theory they can be stored in hardened vaults, with biometric access controls (as the vaults Ozzie describes would have to be.)

But this is pretty much where the similarity ends. You don’t have to be a technical genius to recognize that there’s a world of difference between a key that gets accessed once every month — and can be revoked if it’s discovered in the wild —  and a key that may be accessed dozens of times per day and will be effectively undetectable if it’s captured by a sophisticated adversary.

Moreover, signing keys leak all the time. The phenomenon is so common that journalists have given it a name: it’s called “Stuxnet-style code signing”. The name derives from the fact that the Stuxnet malware — the nation-state malware used to sabotage Iran’s nuclear program — was authenticated with valid code signing keys, many of which were (presumably) stolen from various software vendors. This practice hasn’t remained with nation states, unfortunately, and has now become common in retail malware.

The folks who argue in favor of key escrow proposals generally propose that these keys can be stored securely in special devices called Hardware Security Modules (HSMs). Many HSMs are quite solid. They are not magic, however, and they are certainly not up to the threat model that a massive-scale key escrow system would expose them to. Rather than being invulnerable, they continue to cough up vulnerabilities like this one. A single such vulnerability could be game-over for any key escrow system that used it.

In some follow up emails, Ozzie suggests that keys could be “rotated” periodically, ensuring that even after a key compromise the system could renew security eventually. He also emphasizes the security mechanisms (such as biometric access controls) that would be present in such a vault. I think that these are certainly valuable and necessary protections, but I’m not convinced that they would be sufficient.

Assume a secure processor

Let’s suppose for a second that an attacker does get access to the Apple (or Samsung, or ZTE) key vault. In the section above I addressed the likelihood of such an attack. Now let’s talk about the impact.

Ozzie’s proposal has one significant countermeasure against an attacker who wants to use these stolen keys to illegally spy on (access) your phone. Specifically, should an attacker attempt to illegally access your phone, the phone will be effectively destroyed. This doesn’t protect you from having your files read — that horse has fled the stable — but it should alert you to the fact that something fishy is going on. This is better than nothing.

This measure is pretty important, not only because it protects you against evil maid attacks. As far as I can tell, this protection is pretty much the only measure by which theft of the master decryption keys might ever be detected. So it had better work well.

The details on how this might work aren’t very clear in Ozzie’s patent, but the Wired article describes it as follows. This quote to repeat Ozzie’s presentation at Columbia University:

What Ozzie appears to describe here is a secure processor contained within every phone. This processor would be capable if securely and irreversibly enforcing that once law enforcement has accessed a phone, that phone could no longer be placed into an operational state.

My concern with this part of Ozzie’s proposal is fairly simple: this processor does not currently exist. To explain why this, let me tell a story.

Back in 2013, Apple began installing a secure processor in each of their phones. While this secure processor (called the Secure Enclave Processor, or SEP) is not exactly the same as the one Ozzie proposes, the overall security architecture seems very similar.

One main goal of Apple’s SEP was to limit the number of passcode guessing attempts that a user could make against a locked iPhone. In short, it was designed to keep track of each (failed) login attempt and keep a counter. If the number of attempts got too high, the SEP would make the user wait a while — in the best case — or actively destroy the phone’s keys. This last protection is effectively identical to Ozzie’s proposal. (With some modest differences: Ozzie proposes to “blow fuses” in the phone, rather than erasing a key; and he suggests that this event would triggered by entry of a recovery passcode.*)

For several years, the SEP appeared to do its job fairly effectively. Then in 2017, everything went wrong. Two firms, Cellebrite and Grayshift, announced that they had products that effectively unlocked every single Apple phone, without any need to dismantle the phone. Digging into the details of this exploit, it seems very clear that both firms — working independently — have found software exploits that somehow disable the protections that are supposed to be offered by the SEP.

The cost of this exploit (to police and other law enforcement)? About $3,000-$5,000 per phone. Or (if you like to buy rather than rent) about $15,000. Aso, just to add an element of comedy to the situation, the GrayKey source code appears to have recently been stolen. The attackers are extorting the company for two Bitcoin. Because 2018. (🤡👞)

Let me sum this up my point in case I’m not beating you about the head quite enough:

The richest and most sophisticated phone manufacturer in the entire world tried to build a processor that achieved goals similar to those Ozzie requires. And as of April 2018, after five years of trying, they have been unable to achieve this goala goal that is critical to the security of the Ozzie proposal as I understand it.

Now obviously the lack of a secure processor today doesn’t mean such a processor will never exist. However, let me propose a general rule: if your proposal fundamentally relies on a secure lock that nobody can ever break, then it’s on you to show me how to build that lock.

Conclusion

While this mainly concludes my notes about on Ozzie’s proposal, I want to conclude this post with a side note, a response to something I routinely hear from folks in the law enforcement community. This is the criticism that cryptographers are a bunch of naysayers who aren’t trying to solve “one of the most fundamental problems of our time”, and are instead just rejecting the problem with lazy claims that it “can’t work”.

As a researcher, my response to this is: phooey.

Cryptographers — myself most definitely included — love to solve crazy problems. We do this all the time. You want us to deploy a new cryptocurrency? No problem! Want us to build a system that conducts a sugar-beet auction using advanced multiparty computation techniques? Awesome. We’re there. No problem at all.

But there’s crazy and there’s crazy.

The reason so few of us are willing to bet on massive-scale key escrow systems is that we’ve thought about it and we don’t think it will work. We’ve looked at the threat model, the usage model, and the quality of hardware and software that exists today. Our informed opinion is that there’s no detection system for key theft, there’s no renewability system, HSMs are terrifically vulnerable (and the companies largely staffed with ex-intelligence employees), and insiders can be suborned. We’re not going to put the data of a few billion people on the line an environment where we believe with high probability that the system will fail.

Maybe that’s unreasonable. If so, I can live with that.

by Matthew Green at April 26, 2018 12:26 PM

April 25, 2018

R.I.Pienaar

Choria Progress Update

It’s been a while since my previous update and quite a bit have happened since.

Choria Server

As previously mentioned the Choria Server will aim to replace mcollectived eventually. Thus far I was focussed on it’s registration subsystem, Golang based MCollective RPC compatible agents and being able to embed it into other software for IoT and management backplanes.

Over the last few weeks I learned that MCollective will no longer be shipped in Puppet Agent version 6 which is currently due around Fall 2018. This means we have to accelerate making Choria standalone in it’s own right.

A number of things have to happen to get there:

  • Choria Server should support Ruby agents
  • The Ruby libraries Choria Server needs either need to be embedded and placed dynamically or provided via a Gem
  • The Ruby client needs to be provided via a Gem
  • New locations for these Ruby parts are needed outside of AIO Ruby

Yesterday I released the first step in this direction, you can now replace mcollectived with choria server. For now I am marking this as a preview/beta feature while we deal with issues the community finds.

The way this works is that we provide a small shim that uses just enough of MCollective to get the RPC framework running – luckily this was initially developed as a MCollective plugin and it retained its quite separate code base. When the Go code needs to invoke a ruby agent it will call the shim to do so, the shim in turn will provide the result from the agent – in JSON format – back to Go.

This works for me with any agent I’ve tried it with and I am quite pleased with the results:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     10820  0.0  1.1 1306584 47436 ?       Sl   13:50   0:06 /opt/puppetlabs/puppet/bin/ruby /opt/puppetlabs/puppet/bin/mcollectived

MCollective would of course include the entire Puppet as soon as any agent that uses Puppet is loaded – service, package, puppet – and so over time things only get worse. Here is Choria:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     32396  0.0  0.5 296436  9732 ?        Ssl  16:07   0:03 /usr/sbin/choria server --config=/etc/choria/server.conf

I run a couple 100 000 instances of this and this is what you get, it never changes really. This is because Choria spawns the Ruby code and that will exit when done.

This has an unfortunate side effect that the service, package and puppet agents are around 1 second slower per invocation because loading Puppet is really slow. Ones that do not load Puppet are only marginally slower.

irb(main):002:0> Benchmark.measure { require "puppet" }.real
=> 0.619865644723177

There is a page set up dedicated to the Beta that details how to run it and what to look out for.

JSON pure protocol

Some of the reasons for breakage that you might run into – like mco facts is not working now with Choria Server – is due to a hugely significant change in the background. Choria – both plugged into MCollective and Standalone – is JSON safe. The Ruby Plugin is optionally so (and off by default) but the Choria daemon only supports JSON.

Traditionally MCollective have used YAML on the wire, being quite old JSON was really not that big a deal back in the early 2000s when the foundation for this choice was laid down, XML was more important. Worse MCollective have exposed Ruby specific data types and YAML extensions on the wire which have made creating cross platform support nearly impossible.

YAML is also of course capable of carrying any object – which means some agents are just never going to be compatible with anything but Ruby. This was the case with the process agent but I fixed that before shipping it in Choria. It also essentially means YAML can invoke things you might not have anticipated and so happens big security problems.

Since quite some time now the Choria protocol is defined, versioned and JSON schemas are available. The protocol makes the separation between Payload, Security, Transport and Federation much clearer and the protocol can now support anything that can move JSON – Middleware, REST, SSH, Postal Doves are all capable of carrying Choria packets.

There is a separate Golang implementation of the protocol that is transport agnostic and the schemas are there. Version 1 of the protocol is a tad skewed to MCollective but Version 2 (not yet planned) will drop those shackles. A single Choria Server is capable of serving multiple versions of the network protocol and communicate with old and new clients.

Golang being a static language and having a really solid and completely compatible implementation of the protocol means making ones for other languages like Python etc will not be hard. However I think long term the better option for other languages are still a capable REST gateway.

I did some POC work on a very very light weight protocol suitable for devices like Arduino and will provide bridging between the worlds in our Federation Brokers. You’ll be able to mco rpc wallplug off, your client will talk full Choria Protocol and the wall plug might speak a super light weight MQTT based protocol and you will not even know this.

There are some gotchas as a result of these changes, also captured in the Choria Server evaluation documentation. To resolve some of these I need to be much more aggressive with what I do to the MCollective libraries, something I can do once they are liberated out of Puppet Agent.

by R.I. Pienaar at April 25, 2018 08:25 AM

April 23, 2018

Vincent Bernat

A more privacy-friendly blog

When I started this blog, I embraced some free services, like Disqus or Google Analytics. These services are quite invasive for users’ privacy. Over the years, I have tried to correct this to reach a point where I do not rely on any “privacy-hostile” services.

Analytics🔗

Google Analytics is an ubiquitous solution to get a powerful analytics solution for free. It’s also a great way to provide data about your visitors to Google—also for free. There are self-hosted solutions like Matomo—previously Piwik.

I opted for a simpler solution: no analytics. It also enables me to think that my blog attracts thousands of visitors every day.

Fonts🔗

Google Fonts is a very popular font library and hosting service, which relies on the generic Google Privacy Policy. The google-webfonts-helper service makes it easy to self-host any font from Google Fonts. Moreover, with help from pyftsubset, I include only the characters used in this blog. The font files are lighter and more complete: no problem spelling “Antonín Dvořák”.

Videos🔗

  • Before: YouTube
  • After: self-hosted

Some articles are supported by a video (like “OPL2LPT: an AdLib sound card for the parallel port“). In the past, I was using YouTube, mostly because it was the only free platform with an option to disable ads. Streaming on-demand videos is usually deemed quite difficult. For example, if you just use the <video> tag, you may push a too big video for people with a slow connection. However, it is not that hard, thanks to hls.js, which enables to deliver video sliced in segments available at different bitrates. Users with Java­Script disabled are still delivered with a progressive version of medium quality.

In “Self-hosted videos with HLS”, I explain this approach in more details.

Comments🔗

Disqus is a popular comment solution for static websites. They were recently acquired by Zeta Global, a marketing company and their business model is supported only by advertisements. On the technical side, Disqus also loads several hundred kilobytes of resources. Therefore, many websites load Disqus on demand. That’s what I did. This doesn’t solve the privacy problem and I had the sentiment people were less eager to leave a comment if they had to execute an additional action.

For some time, I thought about implementing my own comment system around Atom feeds. Each page would get its own feed of comments. A piece of Java­Script would turn these feeds into HTML and comments could still be read without Java­Script, thanks to the default rendering provided by browsers. People could also subscribe to these feeds: no need for mail notifications! The feeds would be served as static files and updated on new comments by a small piece of server-side code. Again, this could work without Javascript.

Day Planner by Fowl Language Comics
Fowl Language Comics: Day Planner or the real reason why I didn't code a new comment system.

I still think this is a great idea. But I didn’t feel like developing and maintaining a new comment system. There are several self-hosted alternatives, notably Isso and Commento. Isso is a bit more featureful, with notably an imperfect import from Disqus. Both are struggling with maintenance and are trying to become sustainable with a paid hosted version.1 Commento is more privacy-friendly as it doesn’t use cookies at all. However, cookies from Isso are not essential and can be filtered with nginx:

proxy_hide_header Set-Cookie;
proxy_hide_header X-Set-Cookie;
proxy_ignore_headers Set-Cookie;

In Isso, there is currently no mail notifications, but I have added an Atom feed for each comment thread.

Another option would have been to not provide comments anymore. However, I had some great contributions as comments in the past and I also think they can work as some kind of peer review for blog articles: they are a weak guarantee that the content is not totally wrong.

Search engine🔗

A way to provide a search engine for a personal blog is to provide a form for a public search engine, like Google. That’s what I did. I also slapped some Java­Script on top of that to make it look like not Google.

The solution here is easy: switch to DuckDuckGo, which lets you customize a bit the search experience:

<form id="lf-search" action="https://duckduckgo.com/">
  <input type="hidden" name="kf" value="-1">
  <input type="hidden" name="kaf" value="1">
  <input type="hidden" name="k1" value="-1">
  <input type="hidden" name="sites" value="vincent.bernat.im/en">
  <input type="submit" value="">
  <input type="text" name="q" value="" autocomplete="off" aria-label="Search">
</form>

The Java­Script part is also removed as DuckDuckGo doesn’t provide an API. As it is unlikely that more than three people will use the search engine in a year, this seems a good idea to not spend too much time on this non-essential feature.

Newsletter🔗

  • Before: RSS feed
  • After: still RSS feed but also a MailChimp newsletter

Nowadays, RSS feeds are far less popular they were before. I am still baffled as why a technical audience wouldn’t use RSS, but some readers prefer to receive updates by mail.

MailChimp is a common solution to send newsletters. It provides a simple integration with RSS feeds to trigger a mail each time new items are added to the feed. From a privacy point of view, MailChimp seems a good citizen: data collection is mainly limited to the amount needed to operate the service. Privacy-conscious users can still avoid this service and use the RSS feed.

Less Java­Script🔗

  • Before: third-party Java­Script code
  • After: self-hosted Java­Script code

Many privacy-conscious people are disabling Java­Script or using extensions like uMatrix or NoScript. Except for comments, I was using Java­Script only for non-essential stuff:

For mathematical formulae, I have switched from MathJax to KaTeX. The later is faster but also enables server-side rendering: it produces the same output regardless of browser. Therefore, client-side Java­Script is not needed anymore.

For sidenotes, I have turned the Java­Script code doing the transformation into Python code, with pyquery. No more client-side Java­Script for this aspect either.

The remaining code is still here but is self-hosted.

Memento: CSP🔗

The HTTP Content-Security-Policy header controls the resources that a user agent is allowed to load for a given page. It is a safeguard and a memento for the external resources a site will use. Mine is moderately complex and shows what to expect from a privacy point of view:3

Content-Security-Policy:
  default-src 'self' blob:;
  script-src  'self' blob: https://d1g3mdmxf8zbo9.cloudfront.net/js/;
  object-src  'self' https://d1g3mdmxf8zbo9.cloudfront.net/images/;
  img-src     'self' data: https://d1g3mdmxf8zbo9.cloudfront.net/images/;
  frame-src   https://d1g3mdmxf8zbo9.cloudfront.net/images/;
  style-src   'self' 'unsafe-inline' https://d1g3mdmxf8zbo9.cloudfront.net/css/;
  font-src    'self' about: data: https://d1g3mdmxf8zbo9.cloudfront.net/fonts/;
  worker-src  blob:;
  media-src   'self' blob: https://luffy-video.sos-ch-dk-2.exo.io;
  connect-src 'self' https://luffy-video.sos-ch-dk-2.exo.io https://comments.luffy.cx;
  frame-ancestors 'none';
  block-all-mixed-content;

I am quite happy having been able to reach this result. 😊


  1. For Isso, look at comment.sh. For Commento, look at commento.io↩︎

  2. You may have noticed I am a footnote sicko and use them all the time for pointless stuff. ↩︎

  3. I don’t have issue with using a CDN like CloudFront: it is a paid service and Amazon AWS is not in the business of tracking users. ↩︎

by Vincent Bernat at April 23, 2018 08:01 AM

April 21, 2018

Cryptography Engineering

Wonk post: chosen ciphertext security in public-key encryption (Part 1)

In general I try to limit this blog to posts that focus on generally-applicable techniques in cryptography. That is, I don’t focus on the deeply wonky. But this post is going to be an exception. Specifically, I’m going to talk about a topic that most “typical” implementers don’t — and shouldn’t — think about.

Specifically: I’m going to talk about various techniques for making public key encryption schemes chosen ciphertext secure. I see this as the kind of post that would have saved me ages of reading when I was a grad student, so I figured it wouldn’t hurt to write it all down.

Background: CCA(1/2) security

Early (classical) ciphers used a relatively weak model of security, if they used one at all. That is, the typical security model for an encryption scheme was something like the following:

  1. I generate an encryption key (or keypair for public-key encryption)
  2. I give you the encryption of some message of my choice
  3. You “win” if you can decrypt it

This is obviously not a great model in the real world, for several reasons. First off, in some cases the attacker knows a lot about the message to be decrypted. For example: it may come from a small space (like a set of playing cards). For this reason we require a stronger definition like “semantic security” that assumes the attacker can choose the plaintext distribution, and can also obtain the encryption of messages of his/her own choice. I’ve written more about this here.

More relevant to this post, another limitation of the above game is that — in some real-world examples — the attacker has even more power. That is: in addition to obtaining the encryption of chosen plaintexts, they may be able to convince the secret keyholder to decrypt chosen ciphertexts of their choice.

The latter attack is called a chosen-ciphertext (CCA) attack.

At first blush this seems like a really stupid model. If you can ask the keyholder to decrypt chosen ciphertexts, then isn’t the scheme just obviously broken? Can’t you just decrypt anything you want?

The answer, it turns out, is that there are many real-life examples where the attacker has decryption capability, but the scheme isn’t obviously broken. For example:

  1. Sometimes an attacker can decrypt a limited set of ciphertexts (for example, because someone leaves the decryption machine unattended at lunchtime.) The question then is whether they can learn enough from this access to decrypt other ciphertexts that are generated after she loses access to the decryption machine — for example, messages that are encrypted after the operator comes back from lunch.
  2. Sometimes an attacker can submit any ciphertext she wants — but will only obtain a partial decryption of the ciphertext. For example, she might learn only a single bit of information such as “did this ciphertext decrypt correctly”. The question, then, is whether she can leverage this tiny amount of data to fully decrypt some ciphertext of her choosing.

The first example is generally called a “non-adaptive” chosen ciphertext attack, or a CCA1 attack (and sometimes, historically, a “lunchtime” attack). There are a few encryption schemes that totally fall apart under this attack — the most famous textbook example is Rabin’s public key encryption scheme, which allows you to recover the full secret key from just a single chosen-ciphertext decryption.

The more powerful second example is generally referred to as an “adaptive” chosen ciphertext attack, or a CCA2 attack. The term refers to the idea that the attacker can select the ciphertexts they try to decrypt based on seeing a specific ciphertext that they want to attack, and by seeing the answers to specific decryption queries.

In this article we’re going to use the more powerful “adaptive” (CCA2) definition, because that subsumes the CCA1 definition. We’re also going to focus primarily on public-key encryption.

With this in mind, here is the intuitive definition of the experiment we want a CCA2 public-key encryption scheme to be able to survive:

  1. I generate an encryption keypair for a public-key scheme and give you the public key.
  2. You can send me (sequentially and adaptively) many ciphertexts, which I will decrypt with my secret key. I’ll give you the result of each decryption.
  3. Eventually you’ll send me a pair of messages (of equal length) M_0, M_1 and I’ll pick a bit b at random, and return to you the encryption of M_b, which I will denote as C^* \leftarrow {\sf Encrypt}(pk, M_b).
  4. You’ll repeat step (2), sending me ciphertexts to decrypt. If you send me C^* I’ll reject your attempt. But I’ll decrypt any other ciphertext you send me, even if it’s only slightly different from C^*.
  5. The attacker outputs their guess b'. They “win” the game if b'=b.

We say that our scheme is secure if the attacker wins only with a significantly greater probability than they would win with if they simply guessed b' at random. Since they can win this game with probability 1/2 just by guessing randomly, that means we want (Probability attacker wins the game) – 1/2 to be “very small” (typically a negligible function of the security parameter).

You should notice two things about this definition. First, it gives the attacker the full decryption of any ciphertext they send me. This is obviously much more powerful than just giving the attacker a single bit of information, as we mentioned in the example further above. But note that powerful is good. If our scheme can remain secure in this powerful experiment, then clearly it will be secure in a setting where the attacker gets strictly less information from each decryption query.

The second thing you should notice is that we impose a single extra condition in step (4), namely that the attacker cannot ask us to decrypt C^*. We do this only to prevent the game from being “trivial” — if we did not impose this requirement, the attacker could always just hand us back C^* to decrypt, and they would always learn the value of b.

(Notice as well that we do not give the attacker the ability to request encryptions of chosen plaintexts. We don’t need to do that in the public key encryption version of this game, because we’re focusing exclusively on public-key encryption here — since the attacker has the public key, she can encrypt anything she wants without my help.)

With definitions out of the way, let’s talk a bit about how we achieve CCA2 security in real schemes.

A quick detour: symmetric encryption

This post is mainly going to focus on public-key encryption, because that’s actually the problem that’s challenging and interesting to solve. It turns out that achieving CCA2 for symmetric-key encryption is really easy. Let me briefly explain why this is, and why the same ideas don’t work for public-key encryption.

(To explain this, we’ll need to slightly tweak the CCA2 definition above to make it work in the symmetric setting. The changes here are small: we won’t give the attacker a public key in step (1), and at steps (2) and (4) we will allow the attacker to request the encryption of chosen plaintexts as well as the decryption.)

The first observation is that many common encryption schemes — particularly, the widely-used cipher modes of operation like CBC and CTR — are semantically secure in a model where the attacker does not have the ability to decrypt chosen ciphertexts. However, these same schemes break completely in the CCA2 model.

The simple reason for this is ciphertext malleability. Take CTR mode, which is particularly easy to mess with. Let’s say we’ve obtained a ciphertext C^* at step (4) (recall that C^* is the encryption of M_b), it’s trivially easy to “maul” the ciphertext — simply by flipping, say, a bit of the message (i.e., XORing it with “1”). This gives us a new ciphertext C' = C^* \oplus 1 that we are now allowed to submit for decryption. We are now allowed (by the rules of the game) to submit this ciphertext, and obtain M_b \oplus 1, which we can use to figure out b.

(A related, but “real world” variant of this attack is Vaudenay’s Padding Oracle Attack, which breaks actual implementations of symmetric-key cryptosystems. Here’s one we did against Apple iMessage. Here’s an older one on XML encryption.)

So how do we fix this problem? The straightforward observation is that we need to prevent the attacker from mauling the ciphertext C^*. The generic approach to doing this is to modify the encryption scheme so that it includes a Message Authentication Code (MAC) tag computed over every CTR-mode ciphertext. The key for this MAC scheme is generated by the encrypting party (me) and kept with the encryption key. When asked to decrypt a ciphertext, the decryptor first checks whether the MAC is valid. If it’s not, the decryption routine will output “ERROR”. Assuming an appropriate MAC scheme, the attacker can’t modify the ciphertext (including the MAC) without causing the decryption to fail and produce a useless result.

So in short: in the symmetric encryption setting, the answer to CCA2 security is simply for the encrypting parties to authenticate each ciphertext using a secret authentication (MAC) key they generate. Since we’re talking about symmetric encryption, that extra (secret) authentication key can be generated and stored with the decryption key. (Some more efficient schemes make this all work with a single key, but that’s just an engineering convenience.) Everything works out fine.

So now we get to the big question.

CCA security is easy in symmetric encryption. Why can’t we just do the same thing for public-key encryption?

As we saw above, it turns out that strong authenticated encryption is sufficient to get CCA(2) security in the world of symmetric encryption. Sadly, when you try this same idea generically in public key encryption, it doesn’t always work. There’s a short reason for this, and a long one. The short version is: it matters who is doing the encryption.

Let’s focus on the critical difference. In the symmetric CCA2 game above, there is exactly one person who is able to (legitimately) encrypt ciphertexts. That person is me. To put it more clearly: the person who performs the legitimate encryption operations (and has the secret key) is also the same person who is performing decryption.

Even if the encryptor and decryptor aren’t literally the same person, the encryptor still has to be honest. (To see why this has to be the case, remember that the encryptor has shared secret key! If that party was a bad guy, then the whole scheme would be broken, since they could just output the secret key to the bad guys.) And once you’ve made the stipulation that the encryptor is honest, then you’re almost all the way there. It suffices simply to add some kind of authentication (a MAC or a signature) to any ciphertext she encrypts. At that point the decryptor only needs to determine whether any given ciphertexts actually came from the (honest) encryptor, and avoid decrypting the bad ones. You’re done.

Public key encryption (PKE) fundamentally breaks all these assumptions.

In a public-key encryption scheme, the main idea is that anyone can encrypt a message to you, once they get a copy of your public key. The encryption algorithm may sometimes be run by good, honest people. But it can also be run by malicious people. It can be run by parties who are adversarial. The decryptor has to be able to deal with all of those cases. One can’t simply assume that the “real” encryptor is honest.

Let me give a concrete example of how this can hurt you. A couple of years ago I wrote a post about flaws in Apple iMessage, which (at the time) used simple authenticated (public key) encryption scheme. The basic iMessage encryption algorithm used public key encryption (actually a combination of RSA with some AES thrown in for efficiency) so that anyone could encrypt a message to my key. For authenticity, it required that every message be signed with an ECDSA signature by the sender.

When I received a message, I would look up the sender’s public key and first make sure the signature was valid. This would prevent bad guys from tampering with the message in flight — e.g., executing nasty stuff like adaptive chosen ciphertext attacks. If you squint a little, this is almost exactly a direct translation of the symmetric crypto approach we discussed above. We’re simply swapping the MAC for a digital signature.

The problems with this scheme start to become apparent when we consider that there might be multiple people sending me ciphertexts. Let’s say the adversary is on the communication path and intercepts a signed message from you to me. They want to change (i.e., maul) the message so that they can execute some kind of clever attack. Well, it turns out this is simple. They simply rip off the honest signature and replace it one they make themselves:

 

The new message is identical, but now appears to come from a different person (the attacker). Since the attacker has their own signing key, they can maul the encrypted message as much as they want, and sign new versions of that message. If you plug this attack into (a version) of the public-key CCA2 game up top, you see they’ll win quite easily. All they have to do is modify the challenge ciphertext C^* at step (4) to be signed with their own signing key, then they can change it by munging with the CTR mode encryption, and request the decryption of that ciphertext.

Of course if I only accept messages from signed by some original (guaranteed-to-be-honest) sender, this scheme might work out fine. But that’s not the point of public key encryption. In a real public-key scheme — like the one Apple iMessage was trying to build — I should be able to (safely) decrypt messages from anyone, and in that setting this naive scheme breaks down pretty badly.

Whew.

Ok, this post has gotten a bit long, and so far I haven’t actually gotten to the various “tricks” for adding chosen ciphertext security to real public key encryption schemes. That will have to wait until the next post, to come shortly.

by Matthew Green at April 21, 2018 03:40 PM

Vincent Bernat

OPL2 Audio Board: an AdLib sound card for Arduino

In a previous article, I presented the OPL2LPT, a sound card for the parallel port featuring a Yamaha YM3812 chip, also known as OPL2—the chip of the AdLib sound card. The OPL2 Audio Board for Arduino is another indie sound card using this chip. However, instead of relying on a parallel port, it uses a serial interface, which can be drived from an Arduino board or a Raspberry Pi. While the OPL2LPT targets retrogamers with real hardware, the OPL2 Audio Board cannot be used in the same way. Nonetheless, it can also be operated from ScummVM and DOSBox!

OPL2 Audio Board for Arduino
The OPL2 Audio Board over a “Grim Fandango” box.

Unboxing🔗

The OPL2 Audio Board can be purchased on Tindie, either as a kit or fully assembled. I have paired it with a cheap clone of the Arduino Nano. A library to drive the board is available on GitHub, along with some examples.

One of them is DemoTune.ino. It plays a short tune on three channels. It can be compiled and uploaded to the Arduino with PlatformIO—installable with pip install platformio—using the following command:1

$ platformio ci \
    --board nanoatmega328 \
    --lib ../../src \
    --project-option="targets=upload" \
    --project-option="upload_port=/dev/ttyUSB0" \
    DemoTune.ino
[...]
PLATFORM: Atmel AVR > Arduino Nano ATmega328
SYSTEM: ATMEGA328P 16MHz 2KB RAM (30KB Flash)
Converting DemoTune.ino
[...]
Configuring upload protocol...
AVAILABLE: arduino
CURRENT: upload_protocol = arduino
Looking for upload port...
Use manually specified: /dev/ttyUSB0
Uploading .pioenvs/nanoatmega328/firmware.hex
[...]
avrdude: 6618 bytes of flash written
[...]
===== [SUCCESS] Took 5.94 seconds =====

Immediately after the upload, the Arduino plays the tune. 🎶

The next interesting example is SerialIface.ino. It turns the audio board into a sound card over serial port. Once the code has been pushed to the Arduino, you can use the play.py program in the same directory to play VGM files. They are a sample-accurate sound format for many sound chips. They log the exact commands sent. There are many of them on VGMRips. Be sure to choose the ones for the YM3812/OPL2! Here is a small selection:

The OPL2 Audio Board playing some VGM files. It is connected to an Arduino Nano. You can see the LEDs blinking when the Arduino receives the commands from the serial port.

Usage with DOSBox & ScummVM🔗

Notice

The support for the serial protocol used in this section has not been merged yet. In the meantime, grab SerialIface.ino from the pull request: git checkout 50e1717.

When the Arduino is flashed with SerialIface.ino, the board can be driven through a simple protocol over the serial port. By patching DOSBox and ScummVM, we can make them use this unusual sound card. Here are some examples of games:

  • 0:00, with DOSBox, the first level of Doom 🎮 (1993)
  • 1:06, with DOSBox, the introduction of Loom 🎼 (1990)
  • 2:38, with DOSBox, the first level of Lemmings 🐹 (1991)
  • 3:32, with DOSBox, the introduction of Legend of Kyrandia 🃏 (1992)
  • 6:47, with ScummVM, the introduction of Day of the Tentacle ☢️ (1993)
  • 11:10, with DOSBox, the introduction of Another World 🐅 (1991)

Another World (also known as Out of This World), designed by Éric Chahi, is using sampled sounds at 5 kHz or 10 kHz. With a serial port operating at 115,200 bits/s, the 5 kHz option is just within our reach. However, I have no idea if the rendering is faithful.

Update (2018.05)

After some discussions with Walter van Niftrik, we came to the conclusion that, above 1 kHz, DOSBox doesn’t “time-accurately” execute OPL commands. It is using a time slice of one millisecond during which it executes either a fixed number of CPU cycles or as many as possible (with cycles=max). In both cases, emulated CPU instructions are executed as fast as possible and I/O delays are simulated by removing a fixed number of cycles from the allocation of the current time slice.

DOSBox🔗

The serial protocol is described in the SerialIface.ino file:

/*
 * A very simple serial protocol is used.
 *
 * - Initial 3-way handshake to overcome reset delay / serial noise issues.
 * - 5-byte binary commands to write registers.
 *   - (uint8)  OPL2 register address
 *   - (uint8)  OPL2 register data
 *   - (int16)  delay (milliseconds); negative -> pre-delay; positive -> post-delay
 *   - (uint8)  delay (microseconds / 4)
 *
 * Example session:
 *
 * Arduino: HLO!
 * PC:      BUF?
 * Arduino: 256 (switches to binary mode)
 * PC:      0xb80a014f02 (write OPL register and delay)
 * Arduino: k
 *
 * A variant of this protocol is available without the delays. In this
 * case, the BUF? command should be sent as B0F? The binary protocol
 * is now using 2-byte binary commands:
 *   - (uint8)  OPL2 register address
 *   - (uint8)  OPL2 register data
 */

Adding support for this protocol in DOSBox is relatively simple (patch). For best performance, we use the 2-byte variant (5000 ops/s). The binary commands are pipelined and a dedicated thread collects the acknowledgments. A semaphore captures the number of free slots in the receive buffer. As it is not possible to read registers, we rely on DOSBox to emulate the timers, which are mostly used to let the various games detect the OPL2.

The patch is tested only on Linux but should work on any POSIX system—not Windows. To test it, you need to build DOSBox from source:

$ sudo apt build-dep dosbox
$ git clone https://github.com/vincentbernat/dosbox.git -b feature/opl2audioboard
$ cd dosbox
$ ./autogen.sh
$ ./configure && make

Replace the sblaster section of ~/.dosbox/dosbox-SVN.conf:

[sblaster]
sbtype=none
oplmode=opl2
oplrate=49716
oplemu=opl2arduino
opl2arduino=/dev/ttyUSB0

Then, run DOSBox with ./src/dosbox. That’s it!

You will likely get the “OPL2Arduino: too slow, consider increasing buffer” message a lot. To fix this, you need to recompile SerialIface.ino with a bigger receive buffer:

$ platformio ci \
    --board nanoatmega328 \
    --lib ../../src \
    --project-option="targets=upload" \
    --project-option="upload_port=/dev/ttyUSB0" \
    --project-option="build_flags=-DSERIAL_RX_BUFFER_SIZE=512" \
    SerialIface.ino

ScummVM🔗

The same code can be adapted for ScummVM (patch). To test, build it from source:

$ sudo apt build-dep scummvm
$ git clone https://github.com/vincentbernat/scummvm.git -b feature/opl2audioboard
$ cd scummvm
$ ./configure --disable-all-engines --enable-engine=scumm && make

Then, you can start ScummVM with ./scummvm. Select “AdLib Emulator” as the music device and “OPL2 Arduino” as the AdLib emulator.2 Like for DOSBox, watch the console to check if you need a larger receive buffer.

Enjoy! 😍


  1. This command is valid for an Arduino Nano. For another board, take a look at the output of platformio boards arduino↩︎

  2. If you need to specify a serial port other than /dev/ttyUSB0, add a line opl2arduino_device= in the ~/.scummvmrc configuration file. ↩︎

by Vincent Bernat at April 21, 2018 09:19 AM

April 20, 2018

Sarah Allen

false dichotomy of control vs sharing

Email is the killer app of the Internet. Amidst many sharing and collaboration applications and services, most of us frequently fall back to email. Marc Stiegler suggests that email often “just works better”. Why is this?

Digital communication is fast across distances and allows access to incredible volumes of information, yet digital access controls typically force us into a false dichotomy of control vs sharing.

Looking at physical models of sharing and access control, we can see that we already have well-established models where we can give up control temporarily, yet not completely.

Alan Karp illustrated this nicely at last week’s Internet Identity Workshop (IIW) in a quick anecdote:

Marc gave me the key to his car so I could park in in my garage. I couldn’t do it, so I gave the key to my kid, and asked my neighbor to do it for me. She stopped by my house, got the key and used it to park Marc’s car in my garage.

The car key scenario is clear. In addition to possession of they key, there’s even another layer of control — if my kid doesn’t have a driver’s license, then he can’t drive the car, even if he holds the key.

When we translate this story to our modern digital realm, it sounds crazy:

Marc gave me his password so I could copy a file from his computer to mine. I couldn’t do it, so I gave Marc’s password to my kid, and asked my neighbor to do it for me. She stopped by my house so my kid could tell her my password, and then she used it to copy the file from Marc’s computer to mine.

After the conference, I read Marc Stiegler’s 2009 paper Rich Sharing for the Web details key features of sharing that we have in the real world that are illustrated in the anecdote that Alan so effectively rattled off.

These 6 features (enumerated below) enable people to create networks of access rights that implement the Principle of Least Authority (POLA). The key is to limit how much you need to trust someone before sharing. “Systems that do not implement these 6 features will feel rigid and inadequately functional once enough users are involved, forcing the users to seek alternate means to work around the limitations in those applications.”

  1. Dynamic: I can grant access quickly and effortlessly (without involving an administrator).
  2. Attenuated: To give you permission to do or see one thing, I don’t have to give you permission to do everything. (e.g. valet key allows driving, but not access to the trunk)
  3. Chained: Authority may be delegated (and re-delegated).
  4. Composable: I have permission to drive a car from the State of California, and Marc’s car key. I require both permissions together to drive the car.
  5. Cross-jurisdiction: There are three families involved, each with its own policies, yet there’s no
    need to communicate policies to another jurisdiction. In the example, I didn’t need to ask Marc to change his policy to grant my neighbor permission to drive his car.
  6. Accountable: If Marc finds a new scratch on his car, he knows to ask me to pay for the repair. It’s up to me to collect from my neighbor. Digital access control systems will typically record who did which action, but don’t record who asked an administrator to grant permission.

Note: Accountability is not always directly linked to delegation. Marc would likely hold me accountable if his car got scratched, even if my neighbor had damaged the car when parking it in the garage. Whereas, if it isn’t my garage, bur rather a repair shop where my neighbor drops off the car for Marc, then if the repair shop damages the car, Marc would hold them responsible.

How does this work for email?

The following examples from Marc’s paper were edited for brevity:

  • Dynamic: You can send email to anyone any time.
  • Attenuated: When I email you an attachment, I’m sending a read-only copy. You don’t have access to my whole hard drive and you don’t expect that modifying it will change my copy.
  • Chained: I can forward you an email. You can then forward it to someone else.
  • Cross-Domain: I can send email to people at other companies and organizations with permissions from their IT dept.
  • Composable: I can include an attachment from email originating at one company with text or another attachment from another email and send it to whoever I want.
  • Accountable: If Alice asks Bob to edit a file and email it back, and Bob asks Carol to edit the file, and
    Bob then emails it back, Alice will hold Bob responsible if the edits are erroneous. If Carol (whom Alice
    may not know) emails her result directly to Alice, either Alice will ask Carol who she is before accepting
    the changes, or if Carol includes the history of messages in the message, Alice will directly see, once
    again, that she should hold Bob responsible.

Further reading

Alan Karp’s IoT Position Paper compares several sharing tools across these 6 features and also discusses ZBAC (authoriZation-Based Access Control) where authorization is known as a “capability.” An object capability is an unforgeable token that both designates a resource and grants permission to access it.

by sarah at April 20, 2018 12:57 PM

April 07, 2018

Sarah Allen

zero-knowledge proof: trust without shared secrets

In cryptography we typically share a secret which allows us to decrypt future messages. Commonly this is a password that I make up and submit to a Web site, then later produce to verify I am the same person.

I missed Kazue Sako’s Zero Knowledge Proofs 101 presentation at IIW last week, but Rachel Myers shared an impressively simply retelling in the car on the way back to San Francisco, which inspired me to read the notes and review the proof for myself. I’ve attempted to reproduce this simple explanation below, also noting additional sources and related articles.

Zero Knowledge Proofs (ZPKs) are very useful when applied to internet identity — with an interactive exchange you can prove you know a secret without actually revealing the secret.

Understanding Zero Knowledge Proofs with simple math:

x -> f(x)

Simple one way function. Easy to go one way from x to f(x) but mathematically hard to go from f(x) to x.

The most common example is a hash function. Wired: What is Password Hashing? provides an accessible introduction to why hash functions are important to cryptographic applications today.

f(x) = g ^ x mod p

Known(public): g, p
* g is a constant
* p has to be prime

Easy to know x and compute g ^ x mod p but difficult to do in reverse.

Interactive Proof

Alice wants to prove Bob that she knows x without giving any information about x. Bob already knows f(x). Alice can make f(x) public and then prove that she knows x through an interactive exchange with anyone on the Internet, in this case, Bob.

  1. Alice publishes f(x): g^x mod p
  2. Alice picks random number r
  3. Alice sends Bob u = g^r mod p
  4. Now Bob has artifact based on that random number, but can’t actually calculate the random number
  5. Bob returns a challenge e. Either 0 or 1
  6. Alice responds with v:
    If 0, v = r
    If 1, v = r + x
  7. Bob can now calculate:
    If e == 0: Bob has the random number r, as well as the publicly known variables and can check if u == g^v mod p
    If e == 1: u*f(x) = g^v (mod p)

I believe step 6 is true based on Congruence of Powers, though I’m not sure that I’ve transcribed e==1 case accurately with my limited ascii representation.

If r is true random, equally distributed between zero and (p-1), this does not leak any information about x, which is pretty neat, yet not sufficient.

In order to ensure that Alice cannot be impersonated, multiple iterations are required along with the use of large numbers (see IIW session notes).

Further Reading

by sarah at April 07, 2018 05:52 PM

April 05, 2018

Marios Zindilis

A small web application with Angular5 and Django

Django works well as the back-end of an application that uses Angular5 in the front-end. In my attempt to learn Angular5 well enough to build a small proof-of-concept application, I couldn't find a simple working example of a combination of the two frameworks, so I created one. I called this the Pizza Maker. It's available on GitHub, and its documentation is in the README.

If you have any feedback for this, please open an issue on GitHub.

April 05, 2018 11:00 PM

April 03, 2018

R.I.Pienaar

Adding rich object data types to Puppet

Extending Puppet using types, providers, facts and functions are well known and widely done. Something new is how to add entire new data types to the Puppet DSL to create entirely new language behaviours.

I’ve done a bunch of this recently with the Choria Playbooks and some other fun experiments, today I’ll walk through building a small network wide spec system using the Puppet DSL.

Overview


A quick look at what we want to achieve here, I want to be able to do Choria RPC requests and assert their outcomes, I want to write tests using the Puppet DSL and they should run on a specially prepared environment. In my case I have a AWS environment with CentOS, Ubuntu, Debian and Archlinux machines:

Below I test the File Manager Agent:

  • Get status for a known file and make sure it finds the file
  • Create a brand new file, ensure it reports success
  • Verify that the file exist and is empty using the status action

cspec::suite("filemgr agent tests", $fail_fast, $report) |$suite| {
 
  # Checks an existing file
  $suite.it("Should get file details") |$t| {
    $results = choria::task("mcollective", _catch_errors => true,
      "action" => "filemgr.status",
      "nodes" => $nodes,
      "silent" => true,
      "fact_filter" => ["kernel=Linux"],
      "properties" => {
        "file" => "/etc/hosts"
      }
    )
 
    $t.assert_task_success($results)
 
    $results.each |$result| {
      $t.assert_task_data_equals($result, $result["data"]["present"], 1)
    }
  }
 
  # Make a new file and check it exists
  $suite.it("Should support touch") |$t| {
    $fname = sprintf("/tmp/filemgr.%s", strftime(Timestamp(), "%s"))
 
    $r1 = choria::task("mcollective", _catch_errors => true,
      "action" => "filemgr.touch",
      "nodes" => $nodes,
      "silent" => true,
      "fact_filter" => ["kernel=Linux"],
      "fail_ok" => true,
      "properties" => {
        "file" => $fname
      }
    )
 
    $t.assert_task_success($r1)
 
    $r2 = choria::task("mcollective", _catch_errors => true,
      "action" => "filemgr.status",
      "nodes" => $nodes,
      "silent" => true,
      "fact_filter" => ["kernel=Linux"],
      "properties" => {
        "file" => $fname
      }
    )
 
    $t.assert_task_success($r2)
 
    $r2.each |$result| {
      $t.assert_task_data_equals($result, $result["data"]["present"], 1)
      $t.assert_task_data_equals($result, $result["data"]["size"], 0)
    }
  }
}

I also want to be able to test other things like lets say discovery:

  cspec::suite("${method} discovery method", $fail_fast, $report) |$suite| {
    $suite.it("Should support a basic discovery") |$t| {
      $found = choria::discover(
        "discovery_method" => $method,
      )
 
      $t.assert_equal($found.sort, $all_nodes.sort)
    }
  }

So we want to make a Spec like system that can drive Puppet Plans (aka Choria Playbooks) and do various assertions on the outcome.

We want to run it with mco playbook run and it should write a JSON report to disk with all suites, cases and assertions.

Adding a new Data Type to Puppet


I’ll show how to add the Cspec::Suite data Type to Puppet. This comes in 2 parts: You have to describe the Type that is exposed to Puppet and you have to provide a Ruby implementation of the Type.

Describing the Objects


Here we create the signature for Cspec::Suite:

# modules/cspec/lib/puppet/datatypes/cspec/suite.rb
Puppet::DataTypes.create_type("Cspec::Suite") do
  interface <<-PUPPET
    attributes => {
      "description" => String,
      "fail_fast" => Boolean,
      "report" => String
    },
    functions => {
      it => Callable[[String, Callable[Cspec::Case]], Any],
    }
  PUPPET
 
  load_file "puppet_x/cspec/suite"
 
  implementation_class PuppetX::Cspec::Suite
end

As you can see from the line of code cspec::suite(“filemgr agent tests”, $fail_fast, $report) |$suite| {….} we pass 3 arguments: a description of the test, if the test should fail immediately on any error or keep going and there to write the report of the suite to. This corresponds to the attributes here. A function that will be shown later takes these and make our instance.

We then have to add our it() function which again takes a description and yields out `Cspec::Case`, it returns any value.

When Puppet needs the implementation of this code it will call the Ruby class PuppetX::Cspec::Suite.

Here is the same for the Cspec::Case:

# modules/cspec/lib/puppet/datatypes/cspec/case.rb
Puppet::DataTypes.create_type("Cspec::Case") do
  interface <<-PUPPET
    attributes => {
      "description" => String,
      "suite" => Cspec::Suite
    },
    functions => {
      assert_equal => Callable[[Any, Any], Boolean],
      assert_task_success => Callable[[Choria::TaskResults], Boolean],
      assert_task_data_equals => Callable[[Choria::TaskResult, Any, Any], Boolean]
    }
  PUPPET
 
  load_file "puppet_x/cspec/case"
 
  implementation_class PuppetX::Cspec::Case
end

Adding the implementation


The implementation is a Ruby class that provide the logic we want, I won’t show the entire thing with reporting and everything but you’ll get the basic idea:

# modules/cspec/lib/puppet_x/cspec/suite.rb
module PuppetX
  class Cspec
    class Suite
      # Puppet calls this method when it needs an instance of this type
      def self.from_asserted_hash(description, fail_fast, report)
        new(description, fail_fast, report)
      end
 
      attr_reader :description, :fail_fast
 
      def initialize(description, fail_fast, report)
        @description = description
        @fail_fast = !!fail_fast
        @report = report
        @testcases = []
      end
 
      # what puppet file and line the Puppet DSL is on
      def puppet_file_line
        fl = Puppet::Pops::PuppetStack.stacktrace[0]
 
        [fl[0], fl[1]]
      end
 
      def outcome
        {
          "testsuite" => @description,
          "testcases" => @testcases,
          "file" => puppet_file_line[0],
          "line" => puppet_file_line[1],
          "success" => @testcases.all?{|t| t["success"]}
        }
      end
 
      # Writes the memory state to disk, see outcome above
      def write_report
        # ...
      end
 
      def run_suite
        Puppet.notice(">>>")
        Puppet.notice(">>> Starting test suite: %s" % [@description])
        Puppet.notice(">>>")
 
        begin
          yield(self)
        ensure
          write_report
        end
 
 
        Puppet.notice(">>>")
        Puppet.notice(">>> Completed test suite: %s" % [@description])
        Puppet.notice(">>>")
      end
 
      def it(description, &blk)
        require_relative "case"
 
        t = PuppetX::Cspec::Case.new(self, description)
        t.run(&blk)
      ensure
        @testcases << t.outcome
      end
    end
  end
end

And here is the Cspec::Case:

# modules/cspec/lib/puppet_x/cspec/case.rb
module PuppetX
  class Cspec
    class Case
      # Puppet calls this to make instances
      def self.from_asserted_hash(suite, description)
        new(suite, description)
      end
 
      def initialize(suite, description)
        @suite = suite
        @description = description
        @assertions = []
        @start_location = puppet_file_line
      end
 
      # assert 2 things are equal and show sender etc in the output
      def assert_task_data_equals(result, left, right)
        if left == right
          success("assert_task_data_equals", "%s success" % result.host)
          return true
        end
 
        failure("assert_task_data_equals: %s" % result.host, "%s\n\n\tis not equal to\n\n %s" % [left, right])
      end
 
      # checks the outcome of a choria RPC request and make sure its fine
      def assert_task_success(results)
        if results.error_set.empty?
          success("assert_task_success:", "%d OK results" % results.count)
          return true
        end
 
        failure("assert_task_success:", "%d failures" % [results.error_set.count])
      end
 
      # assert 2 things are equal
      def assert_equal(left, right)
        if left == right
          success("assert_equal", "values matches")
          return true
        end
 
        failure("assert_equal", "%s\n\n\tis not equal to\n\n %s" % [left, right])
      end
 
      # the puppet .pp file and line Puppet is on
      def puppet_file_line
        fl = Puppet::Pops::PuppetStack.stacktrace[0]
 
        [fl[0], fl[1]]
      end
 
      # show a OK message, store the assertions that ran
      def success(what, message)
        @assertions << {
          "success" => true,
          "kind" => what,
          "file" => puppet_file_line[0],
          "line" => puppet_file_line[1],
          "message" => message
        }
 
        Puppet.notice("&#x2714;︎ %s: %s" % [what, message])
      end
 
      # show a Error message, store the assertions that ran
      def failure(what, message)
        @assertions << {
          "success" => false,
          "kind" => what,
          "file" => puppet_file_line[0],
          "line" => puppet_file_line[1],
          "message" => message
        }
 
        Puppet.err("✘ %s: %s" % [what, @description])
        Puppet.err(message)
 
        raise(Puppet::Error, "Test case %s fast failed: %s" % [@description, what]) if @suite.fail_fast
      end
 
      # this will show up in the report JSON
      def outcome
        {
          "testcase" => @description,
          "assertions" => @assertions,
          "success" => @assertions.all? {|a| a["success"]},
          "file" => @start_location[0],
          "line" => @start_location[1]
        }
      end
 
      # invokes the test case
      def run
        Puppet.notice("==== Test case: %s" % [@description])
 
        # runs the puppet block
        yield(self)
 
        success("testcase", @description)
      end
    end
  end
end

Finally I am going to need a little function to create the suite – cspec::suite function, it really just creates an instance of PuppetX::Cspec::Suite for us.

# modules/cspec/lib/puppet/functions/cspec/suite.rb
Puppet::Functions.create_function(:"cspec::suite") do
  dispatch :handler do
    param "String", :description
    param "Boolean", :fail_fast
    param "String", :report
 
    block_param
 
    return_type "Cspec::Suite"
  end
 
  def handler(description, fail_fast, report, &blk)
    suite = PuppetX::Cspec::Suite.new(description, fail_fast, report)
 
    suite.run_suite(&blk)
    suite
  end
end

Bringing it together


So that’s about it, it’s very simple really the code above is pretty basic stuff to achieve all of this, I hacked it together in a day basically.

Lets see how we turn these building blocks into a test suite.

I need a entry point that drives the suite – imagine I will have many different plans to run, one per agent and that I want to do some pre and post run tasks etc.

plan cspec::suite (
  Boolean $fail_fast = false,
  Boolean $pre_post = true,
  Stdlib::Absolutepath $report,
  String $data
) {
  $ds = {
    "type"   => "file",
    "file"   => $data,
    "format" => "yaml"
  }
 
  # initializes the report
  cspec::clear_report($report)
 
  # force a puppet run everywhere so PuppetDB is up to date, disables Puppet, wait for them to finish
  if $pre_post {
    choria::run_playbook("cspec::pre_flight", ds => $ds)
  }
 
  # Run our test suite
  choria::run_playbook("cspec::run_suites", _catch_errors => true,
    ds => $ds,
    fail_fast => $fail_fast,
    report => $report
  )
    .choria::on_error |$err| {
      err("Test suite failed with a critical error: ${err.message}")
    }
 
  # enables Puppet
  if $pre_post {
    choria::run_playbook("cspec::post_flight", ds => $ds)
  }
 
  # reads the report from disk and creates a basic overview structure
  cspec::summarize_report($report)
}

Here’s the cspec::run_suites Playbook that takes data from a Choria data source and drives the suite dynamically:

plan cspec::run_suites (
  Hash $ds,
  Boolean $fail_fast = false,
  Stdlib::Absolutepath $report,
) {
  $suites = choria::data("suites", $ds)
 
  notice(sprintf("Running test suites: %s", $suites.join(", ")))
 
  choria::data("suites", $ds).each |$suite| {
    choria::run_playbook($suite,
      ds => $ds,
      fail_fast => $fail_fast,
      report => $report
    )
  }
}

And finally a YAML file defining the suite, this file describes my AWS environment that I use to do integration tests for Choria and you can see there’s a bunch of other tests here in the suites list and some of them will take data like what nodes to expect etc.

suites:
  - cspec::discovery
  - cspec::choria
  - cspec::agents::shell
  - cspec::agents::process
  - cspec::agents::filemgr
  - cspec::agents::nettest
 
choria.version: mcollective plugin 0.7.0
 
nettest.fqdn: puppet.choria.example.net
nettest.port: 8140
 
discovery.all_nodes:
  - archlinux1.choria.example.net
  - centos7.choria.example.net
  - debian9.choria.example.net
  - puppet.choria.example.net
  - ubuntu16.choria.example.net
 
discovery.mcollective_nodes:
  - archlinux1.choria.example.net
  - centos7.choria.example.net
  - debian9.choria.example.net
  - puppet.choria.example.net
  - ubuntu16.choria.example.net
 
discovery.filtered_nodes:
  - centos7.choria.example.net
  - puppet.choria.example.net
 
discovery.fact_filter: operatingsystem=CentOS

Conclusion


So this then is a rather quick walk through of extending Puppet in ways many of us would not have seen before. I spent about a day getting this all working which included figuring out a way to maintain the mutating report state internally etc, the outcome is a test suite I can run and it will thoroughly drive a working 5 node network and assert the outcomes against real machines running real software.

I used to have a MCollective integration test suite, but I think this is a LOT nicer mainly due to the Choria Playbooks and extensibility of modern Puppet.

$ mco playbook run cspec::suite --data `pwd`/suite.yaml --report `pwd`/report.json

The current code for this is on GitHub along with some Terraform code to stand up a test environment, it’s a bit barren right now but I’ll add details in the next few weeks.

by R.I. Pienaar at April 03, 2018 09:23 AM

HolisticInfoSec.org

toolsmith #132 - The HELK vs APTSimulator - Part 2


Continuing where we left off in The HELK vs APTSimulator - Part 1, I will focus our attention on additional, useful HELK features to aid you in your threat hunting practice. HELK offers Apache Spark, GraphFrames, and Jupyter Notebooks  as part of its lab offering. These capabilities scale well beyond a standard ELK stack, this really is where parallel computing and significantly improved processing and analytics truly take hold. This is a great way to introduce yourself to these technologies, all on a unified platform.

Let me break these down for you a little bit in case you haven't been exposed to these technologies yet. First and foremost, refer to @Cyb3rWard0g's wiki page on how he's designed it for his HELK implementation, as seen in Figure 1.
Figure 1: HELK Architecture
First, Apache Spark. For HELK, "Elasticsearch-hadoop provides native integration between Elasticsearch and Apache Spark, in the form of an RDD (Resilient Distributed Dataset) (or Pair RDD to be precise) that can read data from Elasticsearch." Per the Apache Spark FAQ, "Spark is a fast and general processing engine compatible with Hadoop data" to deliver "lighting-fast cluster computing."
Second, GraphFrames. From the GraphFrames overview, "GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs. GraphFrames represent graphs: vertices (e.g., users) and edges (e.g., relationships between users). GraphFrames also provide powerful tools for running queries and standard graph algorithms. With GraphFrames, you can easily search for patterns within graphs, find important vertices, and more." 
Finally, Jupyter Notebooks to pull it all together.
From Jupyter.org: "The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more." Jupyter Notebooks provide a higher order of analyst/analytics capabilities, if you haven't dipped your toe in that water, this may be your first, best opportunity.
Let's take a look at using Jupyter Notebooks with the data populated to my Docker-based HELK instance as implemented in Part 1. I repopulated my HELK instance with new data from a different, bare metal Windows instance reporting to HELK with Winlogbeat, Sysmon enabled, and looking mighty compromised thanks to @cyb3rops's APTSimulator.
To make use of Jupyter Notebooks, you need your JUPYTER CURRENT TOKEN to access the Jupyter Notebook web interface. It was presented to you when your HELK installation completed, but you can easily retrieve it via sudo docker logs helk-analytics, then copy and paste the URL into your browser to connect for the first time with a token. It will look like this,
http://localhost:8880/?token=3f46301da4cd20011391327647000e8006ee3574cab0b163, as described in the Installation wiki. After browsing to the URL with said token, you can begin at http://localhost:8880/lab, where you should immediately proceed to the Check_Spark_Graphframes_Integrations.ipynb notebook. It's found in the hierarchy menu under training > jupyter_notebooks > getting_started. This notebook is essential to confirming you're ingesting data properly with HELK and that its integrations are fully functioning. Step through it one cell at a time with the play button, allowing each task to complete so as to avoid errors. Remember the above mentioned Resilient Distributed Dataset? This notebook will create a Spark RDD on top of Elasticsearch using the logs-endpoint-winevent-sysmon-* (Sysmon logs) index as source, and do the same thing with the logs-endpoint-winevent-security-* (Window Security Event logs) index as source, as seen in Figure 2.
Figure 2: Windows Security EVT Spark RDD
The notebook will also query your Windows security events via Spark SQL, then print the schema with:
df = spark.read.format("org.elasticsearch.spark.sql").load("logs-endpoint-winevent-security-*/doc")
df.printSchema()
The result should resemble Figure 3.
Figure 3: Schema
Assuming all matches with relative consistency in your experiment, let's move on to the Sysmon_ProcessCreate_Graph.ipynb notebook, found in training > jupyter_notebooks. This notebook will again call on the Elasticsearch Sysmon index and create vertices and edges dataframes, then create a graph produced with GraphFrame built from those same vertices and edges. Here's a little walk-through.
The v parameter (yes, for vertices) is populated with:
v = df.withColumn("id", df.process_guid).select("id","user_name","host_name","process_parent_name","process_name","action")
v = v.filter(v.action == "processcreate")
Showing the top three rows of that result set, with v.show(3,truncate=False), appears as Figure 4 in the notebook, with the data from my APTSimulator "victim" system, N2KND-PC.
Figure 4: WTF, Florian :-)
The epic, uber threat hunter in me believes that APTSimulator created nslookup, 7z, and regedit as processes via cmd.exe. Genius, right? :-)
The e parameter (yes, for edges) is populated with:
e = df.filter(df.action == "processcreate").selectExpr("process_parent_guid as src","process_guid as dst").withColumn("relationship", lit("spawned"))
Showing the top three rows of that result set, with e.show(3,truncate=False), produces the source and destination process IDs as it pertains to the spawning relationship.
Now, to create a graph from the vertices and edges dataframes as defined in the v & e parameters with g = GraphFrame(v, e). Let's bring it home with a hunt for Process A spawning Process B AND Process B Spawning Process C, the code needed, and the result, are seen from the notebook in Figure 5.
Figure 5: APTSimulator's happy spawn
Oh, yes, APTSimulator fully realized in a nice graph. Great example seen in cmd.exe spawning wscript.exe, which then spawns rundll32.exe. Or cmd.exe spawning powershell.exe and schtasks.exe.
Need confirmation? Florian's CactusTorch JS dropper is detailed in Figure 6, specifically cmd.exe > wscript.exe > rundll32.exe.
Figure 6: APTSimulator source for CactusTorch
Still not convinced? How about APTSimulator's schtasks.bat, where APTSimulator kindly loads mimikatz with schtasks.exe for persistence, per Figure 7?
Figure 7: schtasks.bat
I certainly hope that the HELK's graph results matching nicely with APTSimulator source meets with your satisfaction.
The HELK vs APTSimulator ends with a glorious flourish, these two monsters in their field belong in every lab to practice red versus blue, attack and defend, compromise and detect. I haven't been this happy to be a practitioner in the defense against the dark arts in quite awhile. My sincere thanks to Roberto and Florian for their great work on the HELK and APTSimulator. I can't suggest strongly enough how much you'll benefit from taking the time to run through Part 1 and 2 of The HELK vs APTSimulator for yourself. Both tools are well documented on their respective Githubs, go now, get started, profit.
Cheers...until next time.

by Russ McRee (noreply@blogger.com) at April 03, 2018 07:01 AM

April 01, 2018

That grumpy BSD guy

ed(1) mastery is a must for a real Unix person

ed(1) is the standard editor. Now there's a book out to help you master this fundamental Unix tool.

In some circles on the Internet, your choice of text editor is a serious matter.

We've all seen the threads on mailing lits, USENET news groups and web forums about the relative merits of Emacs vs vi, including endless iterations of flame wars, and sometimes even involving lesser known or non-portable editing environments.

And then of course, from the Linux newbies we have seen an endless stream of tweeted graphical 'memes' about the editor vim (aka 'vi Improved') versus the various apparently friendlier-to-some options such as GNU nano. Apparently even the 'improved' version of the classical and ubiquitous vi(1) editor is a challenge even to exit for a significant subset of the younger generation.

Yes, your choice of text editor or editing environment is a serious matter. Mainly because text processing is so fundamental to our interactions with computers.

But for those of us who keep our systems on a real Unix (such as OpenBSD or FreeBSD), there is no real contest. The OpenBSD base system contains several text editors including vi(1) and the almost-emacs mg(1), but ed(1) remains the standard editor.

Now Michael Lucas has written a book to guide the as yet uninitiated to the fundamentals of the original Unix text editor. It is worth keeping in mind that much of Unix and its original standard text editor written back when the standard output and default user interface was more likely than not a printing terminal.

To some of us, reading and following the narrative of Ed Mastery is a trip down memory lane. To others, following along the text will illustrate the horror of the world of pre-graphic computer interfaces. For others again, the fact that ed(1) doesn't use your terminal settings much at all offers hope of fixing things when something or somebody screwed up your system so you don't have a working terminal for that visual editor.

ed(1) is a line editor. And while you may have heard mutters that 'vi is just a line editor in drag', vi(1) does offer a distinctly visual interface that only became possible with the advent of the video terminal, affectionately known as the glass teletype. ed(1) offers no such luxury, but as the book demonstrates, even ed(1) is able to display any part of a file's content for when you are unsure what your file looks like.

The book Ed Mastery starts by walking the reader through a series of editing sessions using the classical ed(1) line editing interface. To some readers the thought of editing text while not actually seeing at least a few lines at the time onscreen probably sounds scary.  This book shows how it is done and while the author never explicitly mentions it, the text aptly demonstrates how the ed(1) command set is in fact the precursor of of how things are done in many Unix text processing programs.

As one might expect, the walkthrough of ed(1) text editing functionality is followed up by a sequence on searching and replacing which ultimately leads to a very readable introduction to regular expressions, which of course are part of the ed(1) package too. If you know your ed(1) command set, you are quite far along in the direction of mastering the stream editor sed(1), as well as a number of other systems where regular expressions play a crucial role.

After the basic editing functionality and some minor text processing magic has been dealt with, the book then proceeds to demonstrate ed(1) as a valuable tool in your Unix scripting environment. And once again, if you can do something with ed, you can probably transfer that knowledge pretty much intact to use with other Unix tools.

The eighty-some text pages of Ed Mastery are a source of solid information on the ed(1) tool itself with a good helping of historical context that will make it clearer to newcomers why certain design choices were made back when the Unix world was new. A number of these choices influence how we interact with the modern descendants of the Unix systems we had back then.

Your choice of text editor is a serious matter. With this book, you get a better foundation for choosing the proper tool for your text editing and text processing needs. I'm not saying that you have to switch to the standard editor, but after reading Ed Mastery , your choice of text editing and processing tools will be a much better informed one.

Ed Mastery  is available now directly from Michael W. Lucas' books site at https://www.michaelwlucas.com/tools/ed, and will most likely appear in other booksellers' catalogs as soon as their systems are able to digest the new data.

Do read the book, try out the standard editor and have fun!

by Peter N. M. Hansteen (noreply@blogger.com) at April 01, 2018 10:21 AM

Colin Percival

Tarsnap pricing change

I launched the current Tarsnap website in 2009, and while we've made some minor adjustments to it over the years — e.g., adding a page of testimonials, adding much more documentation, and adding a page with .deb binary packages — the changes have overall been relatively modest. One thing people criticized the design for in 2009 was the fact that prices were quoted in picodollars; this is something I have insisted on retaining for the past eight years.

One of the harshest critics of Tarsnap's flat rate picodollars-per-byte pricing model is Patrick McKenzie — known to much of the Internet as "patio11" — who despite our frequent debates can take credit for ten times more new Tarsnap customers than anyone else, thanks to a single ten thousand word blog post about Tarsnap. The topic of picodollars has become something of an ongoing debate between us, with Patrick insisting that they communicate a fundamental lack of seriousness and sabotage Tarsnap's success as a business, and me insisting that using they communicate exactly what I want to communicate, and attract precisely the customer base I want to have. In spite of our disagreements, however, I really do value Patrick's input; indeed, the changes I mentioned above came about in large part due to the advice I received from him, and for a long time I've been considering following more of Patrick's advice.

A few weeks ago, I gave a talk at the AsiaBSDCon conference about profiling the FreeBSD kernel boot. (I'll be repeating the talk at BSDCan if anyone is interested in seeing it in person.) Since this was my first time in Tokyo (indeed, my first time anywhere in Asia) and despite communicating with him frequently I had never met Patrick in person, I thought it was only appropriate to meet him for dinner; fortunately the scheduling worked out and there was an evening when he was free and I wasn't suffering too much from jetlag. After dinner, Patrick told me about a cron job he runs:

I knew then that the time was coming to make a change Patrick has long awaited: Getting rid of picodollars. It took a few weeks before the right moment arrived, but I'm proud to announce that as of today, April 1st 2018, Tarsnap's storage pricing is 8333333 attodollars per byte-day.

This addresses a long-standing concern I've had about Tarsnap's pricing: Tarsnap bills customers for usage on a daily basis, but since 250 picodollars is not a multiple of 30, usage bills have been rounded. Tarsnap's accounting code works with attodollars internally (Why attodollars? Because it's easy to have 18 decimal places of precision using fixed-point arithmetic with a 64-bit "attodollars" part.) and so during 30-day months I have in fact been rounding down and billing customers at a rate of 8333333 attodollars per byte-day for years — so making this change on the Tarsnap website brings it in line with the reality of the billing system.

Of course, there are other advantages to advertising Tarsnap's pricing in attodollars. Everything which was communicated by pricing storage in picodollars per byte-month is communicated even more effectively by advertising prices in attodollars per byte-day, and I have no doubt that Tarsnap users will appreciate the increased precision.

April 01, 2018 12:00 AM

March 27, 2018

LZone - Sysadmin

Sequence definitions with kwalify

After guess-trying a lot on how to define a simple sequence in kwalify (which I do use as a JSON/YAML schema validator) I want to share this solution for a YAML schema.

So my use case is whitelisting certain keys and somehow ensuring their types. Using this I want to use kwalify to validate YAML files. Doing this for scalars are simple, but hashes and lists of scalar elements are not. Most problematic was the lists...

Defining Arbitrary Scalar Sequences

So how to define a list in kwalify? The user guide gives this example:
---
list:
  type: seq
  sequence:
     - type: str
This gives us a list of strings. But many lists also contain numbers and some contain structured data. For my use case I want to exclude structured date AND allow numbers. So "type: any" cannot be used. Also "type: any" would'nt work because it would require defining the mapping for any, which in a validation use case where we just want to ensure the list as a type, we cannot know. The great thing is there is a type "text" which you can use to allow a list of strings or number or both like this:
---
list:
  type: seq
  sequence:
     - type: text

Building a key name + type validation schema

As already mentioned the need for this is to have a whitelisting schema with simple type validation. Below you see an example for such a schema:
---
type: map
mapping:
  "default_definition": &allow_hash
     type: map
     mapping:
       =:
         type: any

"default_list_definition": &allow_list type: seq sequence: # Type text means string or number - type: text

"key1": *allow_hash "key2": *allow_list "key3": type: str

=: type: number range: { max: 29384855, min: 29384855 }
At the top there are two dummy keys "default_definition" and "default_list_definition" which we use to define two YAML references "allow_hash" and "allow_list" for generic hashes and scalar only lists.

In the middle of the schema you see three keys which are whitelisted and using the references are typed as hash/list and also as a string.

Finally for this to be a whitelist we need to refuse all other keys. Note that '=' as a key name stands for a default definition. Now we want to say: default is "not allowed". Sadly kwalify has no mechanism for this that allows expressing something like
---
  =:
    type: invalid
Therefore we resort to an absurd type definition (that we hopefully never use) for example a number that has to be exactly 29384855. All other keys not listed in the whitelist above, hopefully will fail to be this number an cause kwalify to throw an error.

This is how the kwalify YAML whitelist works.

March 27, 2018 08:59 PM

PyPI does brownouts for legacy TLS

Nice! Reading through the maintenance notices on my status page aggregator I learned that PyPI started intentionally blocking legacy TLS clients as a way of getting people to switch before TLS 1.0/1.1 support is gone for real.

Here is a quote from their status page:

In preparation for our CDN provider deprecating TLSv1.0 and TLSv1.1 protocols, we have begun rolling brownouts for these protocols for the first ten (10) minutes of each hour.

During that window, clients accessing pypi.python.org with clients that do not support TLSv1.2 will receive an HTTP 403 with the error message "This is a brown out of TLSv1 support. TLSv1 support is going away soon, upgrade to a TLSv1.2+ capable client.".


I like this action as a good balance of hurting as much as needed to help end users to stop putting of updates.

March 27, 2018 08:35 PM

March 26, 2018

Sean's IT Blog

The Virtual Horizon Podcast Episode 2 – A Conversation with Angelo Luciani

On this episode of The Virtual Horizon podcast, we’ll journey to the French Rivera for the 2017 Nutanix .Next EU conference. We’ll be joined by Angelo Luciani, Community Evangelist for Nutanix, to discuss blogging and the Virtual Design Master competition.

Nutanix has two large conferences scheduled for 2018 – .Next in New Orleans in May 2018 and .Next EU in London at the end of November 2018.

Show Credits:
Podcast music is a derivative of Boogie Woogie Bed by Jason Shaw (audionatix.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/

by seanpmassey at March 26, 2018 01:00 PM

March 17, 2018

LZone - Sysadmin

Puppet Agent Settings Issue

Experienced a strange puppet agent 4.8 configuration issue this week. To properly distribute the agent runs over time to even out puppet master load I wanted to configure the splay settings properly. There are two settings:
  • A boolean "splay" to enable/disable splaying
  • A range limiter "splayLimit" to control the randomization
What first confused me was the "splay" was not on per-default. Of course when using the open source version it makes sense to have it off. Having it on per-default sounds more like an enterprise feature :-)

No matter the default after deploying an agent config with settings like this
[agent]
runInterval = 3600
splay = true
splayLimit = 3600
... nothing happened. Runs were still not randomized. Checking the active configuration with
# puppet config print | grep splay
splay=false
splayLimit=1800
turned out that my config settings were not working at all. What was utterly confusing is that even the runInterval was reported as 1800 (which is the default value). But while the splay just did not work the effective runInterval was 3600!

After hours of debugging it, I happened to read the puppet documentation section that covers the config sections like [agent] and [main]. It says that [main] configures global settings and other sections can override the settings in [main], which makes sense.

But it just doesn't work this way. In the end the solution was using [main] as config section instead of [agent]:
[main]
runInterval=3600
splay=true
splayLimit=3600
and with this config "puppet config print" finally reported the settings as effective and the runtime behaviour had the expected randomization.

Maybe I misread something somewhere, but this is really hard to debug. And INI file are not really helpful in Unix. Overriding works better default files and with drop dirs.

March 17, 2018 08:38 PM

March 14, 2018

The Lone Sysadmin

No VMware NSX Hardware Gateway Support for Cisco

I find it interesting, as I’m taking my first real steps into the world of VMware NSX, that there is no Cisco equipment supported as a VMware NSX hardware gateway (VTEP). According to the HCL on March 13th, 2018 there is a complete lack of “Cisco” in the “Partner” category: I wonder how that works out […]

The post No VMware NSX Hardware Gateway Support for Cisco appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at March 14, 2018 05:26 PM