Planet SysAdmin


April 25, 2017

Chris Siebenmann

What we need from an Illumos-based distribution

It started on Twitter:

@thatcks: Current status: depressing myself by looking at the status of Illumos distributions. There's nothing else like OmniOS out there.

@ptribble: So what are the values of OmniOS that are important to you and you feel aren't present in other illumos distributions?

This is an excellent question from Peter Tribble of Tribblix and it deserves a more elaborate answer than I could really give it on Twitter. So here's my take on what we need from an Illumos distribution.

  • A traditional Unix server environment that will do NFS fileservice with ZFS, because we already have a lot of tooling and expertise in running such an Illumos-based environment and our current environment has a very particular setup due to local needs. If we have to completely change how we design and operate NFS fileservice (for example to move to a more 'storage appliance' environment), the advantages of continuing with Illumos mostly go away. If we're going to have to drastically redesign our environment no matter what, maybe the simpler redesign is 'move to ZFS on FreeBSD doing NFS service as a traditional Unix server'.

  • A distribution that is actively 'developed', by which I mean that it incorporates upstream changes from Illumos and other sources on a regular basis and the installer is updated as necessary and so on. Illumos is not a static thing; there's important evolution in ZFS fixes and improvements, hardware drivers, and various other components.

  • Periodic stable releases that are supported with security updates and important bug fixes for a few years but that do not get wholesale rolling updates from upstream Illumos. We need this because testing, qualifying, and updating to a whole new release (with a wide assortment of changes) is a very large amount of work and risk. We can't possibly do it on a regular basis, such as every six months; even once a year is too much. Two years of support is probably our practical minimum.

    We could probably live with 'security updates only', although it'd be nice to be able to get high priority bug fixes as well (here I'm thinking of things that will crash your kernel or cause ZFS corruption, which are very close to 'must fix somehow' issues for people running ZFS fileservers).

    We don't need the ability to upgrade from stable release to stable release. For various reasons we're probably always going to upgrade by installing from scratch on new system disks and then swapping things around in downtimes.

  • An installer that lets us install systems from USB sticks or DVD images and doesn't require complicated backend infrastructure. We're not interested in big complex environments where we have to PXE-boot initial installs of servers, possibly with various additional systems to auto-configure them and tell them what to do.

In a world with pkgsrc and similar sources of pre-built and generally maintained packages, I don't think we need the distribution itself to have very many packages (I'm using the very specific 'we' here, meaning my group and our fileservers). Sure, it would be convenient for us if the distribution shipped with Postfix and Python and a few other important things, but it's not essential the way the core of Illumos is. While it would be ideal if the distribution owned everything important the way that Debian, FreeBSD, and so on do, it doesn't seem like Illumos distributions are going to have that kind of person-hours available, even for a relatively small package collection.

With that said I don't need for all packages to come from pkgsrc or whatever; I'm perfectly happy to have a mix of maintained packages from several sources, including the distribution in either their main source or an 'additional packages we use' repository. Since there's probably not going to be a plain-server NFS fileserver focused Illumos distribution, I'd expect any traditional Unix style distribution to have additional interests that lead to them packaging and maintaining some extra packages, whether that's for web service or X desktops or whatever.

(I also don't care about the package format that the distribution uses. If sticking with IPS is the easy way, that's fine. Neither IPS nor pkgsrc are among my favorite package management systems, but I can live with them.)

Out of all of our needs, I expect the 'stable releases' part to be the biggest problem. Stable releases are a hassle for distribution maintainers (or really for maintainers of anything); you have to maintain multiple versions and you may have to backport a steadily increasing amount of things over time. The amount of pain involved in them is why we're probably willing to live with only security updates for a relatively minimal set of packages and not demand backported bugfixes.

(In practice we don't expect to hit new bugs once our fileservers have gone into production and been stable, although it does happen every once in a while.)

Although 10G Ethernet support is very important to us in general, I'm not putting it on this list because I consider it a general Illumos issue, not something that's going to be specific to any particular distribution. If Illumos as a whole has viable 10G Ethernet for us, any reasonably up to date distribution should pick it up, and we don't want to use a distribution that's not picking those kind of things up.

Sidebar: My current short views on other Illumos distributions

Peter Tribble also asked what was missing in existing Illumos distributions. Based on an inspection of the Illumos wiki's options, I can split the available distributions into three sets:

  • OpenIndiana and Tribblix are developed and more or less traditional Unix systems, but don't appear to have stable releases that are maintained for a couple of years; instead there are periodic new releases with all changes included.

  • SmartOS, Nexenta, and napp-it are Illumos based but as far as I can tell aren't in the form of a traditional Unix system. (I'm not sure if napp-it is actively updated, but the other two are.)

  • the remaining distributions don't seem to be actively developed and may not have maintained stable releases either (I didn't look deeply).

Hopefully you can see why OmniOS hit a sweet spot for us; it is (or was) actively maintained, it has 'long term stable' releases that are supported for a few years, and you get a traditional Unix OS environment and a straightforward installation system.

by cks at April 25, 2017 02:19 AM

April 24, 2017

ma.ttias.be

cron.weekly issue #77: OpenStack, Moby, Caddy, Devuan, Linuxkit, Tmux, Jenkins & more

The post cron.weekly issue #77: OpenStack, Moby, Caddy, Devuan, Linuxkit, Tmux, Jenkins & more appeared first on ma.ttias.be.

It's been a busy week for open source, it's been a while since there were this many guides in cron.weekly.

Interesting screencasts on Jenkins, Intel pulls out of OpenStack funding, Docker goes and renames (partly) to Moby, you can build an SSH tunnel between 2 unrelated VMs using the shared CPU cache on public clouds, ...

It’s been a busy week, both personally and in the open source world. So many good content to share this week, I hope you find something among the links you like!

Source: cron.weekly issue #77: OpenStack, Moby, Caddy, Devuan, Linuxkit, Tmux, Jenkins & more

The post cron.weekly issue #77: OpenStack, Moby, Caddy, Devuan, Linuxkit, Tmux, Jenkins & more appeared first on ma.ttias.be.

by Mattias Geniar at April 24, 2017 07:30 AM

Chris Siebenmann

Corebird and coming to a healthier relationship with Twitter

About two months ago I wrote about my then views on the Corebird Twitter client. In that entry I said that Corebird was a great client for checking in on Twitter and skimming through it, but wasn't my preference for actively following Twitter; for that I still wanted Choqok for various reasons. You know what? It turns out that I was wrong. I now feel that Corebird is both a better Linux Twitter client in general and that it's a better Twitter client for me in specific. Unsurprisingly, it's become the dominant Twitter client that I use.

Corebird is mostly a better Twitter client in general because it has much better support for modern Twitter features, even if it's not perfect and there are things from Choqok that I wish it did (even as options). It has niceties like displaying quoted tweets inline and letting me easily and rapidly look at attached media (pictures, animations, etc), and it's just more fluid in general (even if it has some awkward and missing bits, like frankly odd scrolling via the keyboard). Corebird has fast, smooth updates of new tweets more or less any time you want, and it can transparently pull in older tweets as you scroll backwards to a relatively impressive level. Going back to Choqok now actually feels clunky and limited, even though it has features that I theoretically rather want (apart from the bit where I know that several of those features are actually bad for me).

(Corebird's ability to display more things inline makes a surprising difference when skimming Twitter, because I can see more without having to click on links and spawn things in my browser and so on. I also worked out how to make Corebird open up multiple accounts on startup; it's hiding in the per-account settings.)

Corebird is a better Twitter client for me in specific because it clearly encourages me to have a healthier approach to Twitter, the approach I knew I needed a year ago. It's not actually good for me to have a Twitter client open all the time and to try to read everything, and it turns out that Corebird's lack of some features actively encourages me to not try to do this. There's no visible unread count to prod me to pay attention, there is no marker of read versus unread to push me to trying to read all of the unread Tweets one by one, and so on. That Corebird starts fast and lets me skim easily (and doesn't hide itself away in the system tray) also encourages me to close it and not pay attention to Twitter for a while. If I do keep Corebird running and peek in periodically, its combination of features make it easy and natural to skim, rapidly scan, or outright skip the new tweets, so I'm pretty sure I spend less time catching up than I did in Choqok.

(Fast starts matter because I know I can always come back easily if I really want to. As I have it configured, Choqok took quite a while to start up and there were side effects of closing it down with unread messages. In Corebird, startup is basically instant and I know that I can scroll backwards through my timeline to where I was, if I care enough. Mostly I don't, because I'm looking at Twitter to skim it for a bit, not to carefully read everything.)

The net result is that Corebird has turned checking Twitter into what is clearly a diversion, instead of something to actively follow. I call up Corebird when I want to spend some time on Twitter, and then if things get busy there is nothing to push me to get back to it and maybe I can quit out of it in order to make Twitter be even further away (sometimes Corebird helps out here by quietly crashing). This is not quite the 'stop fooling yourself you're not multitasking here' experience that using Twitter on my phone is, but it feels closer to it than Choqok did. Using Corebird has definitely been part of converting Twitter from a 'try to read it all' experience to a 'dip in and see what's going on' one, and the latter is much better for me.

(It turns out that I was right and wrong when I wrote about how UI details mattered for my Twitter experience. Back then I said that a significantly different client from Choqok would mean that my Twitter usage would have to change drastically. As you can see, I was right about that; my Twitter usage has changed drastically. I was just wrong about that necessarily being a bad thing.)

by cks at April 24, 2017 04:30 AM

April 23, 2017

Chris Siebenmann

How I rebased changes on top of other rebased changes in Git

A while ago I wrote an entry on some git repository changes that I didn't know how to do well. One of them was rebasing my own changes on top of a repository that itself had been rebased; in the comments, Aristotle Pagaltzis confirmed that his Stackoverflow answer about this was exactly what I wanted. Since I've now actually gone through this process for the first time, I want to write down the details for myself, with commentary to explain how and why everything works. Much of this commentary will seem obvious to people who use Git a lot, but it reflects some concerns and confusions that I had at the time.

First, the repositories involved. rc is the master upstream repository for Byron Rakitzis's Unix reimplementation of Tom Duff's rc shell. It is not rebased; infrequent changes flow forward as normal for a public Git repo. What I'm going to call muennich-rc is Bert Münnich's collection of interesting modifications on top of rc; it is periodically rebased, either in response to changes in rc or just as Bert Münnich does development on it. Finally I have my own repository with my own local changes on top of muennich-rc. When muennich-rc rebases, I want to rebase my own changes on top of that rebase.

I start in my own repository, before fetching anything from upstream:

  1. git branch old-me

    This creates a branch that captures the initial state of my tree. It's not used in the rebasing process; instead it's a safety measure so that I can reset back to it if necessary without having to consult something like the git reflog. Because I've run git branch without an additional argument, old-me is equivalent to master until I do something to change master.

  2. git branch old-muennich muennich/master.

    muennich/master is the upstream for muennich-rc. Creating a branch captures the (old) top commit for muennich-rc that my changes are on top of.

    Because both old-me and old-muennich have been created as plain ordinary git branches, not upstream tracking branches, their position won't change regardless of fetching and other changes during the rebase. I'm really using them as bookmarks for specific commits instead of actual branches that I will add commits on top of.

    (I'm sure this is second nature to experienced Git people, but when I made old-muennich I had to pause and convince myself that what commit it referred to wasn't going to change later, the way that master changes when you do a 'git pull'. Yes, I know, 'git pull' does more than 'git fetch' does and the difference is important here.)

  3. git fetch

    This pulls in the upstream changes from muennich-rc, updating what muennich/master refers to to be the current top commit of muennich-rc. It's now possible to do things like 'git diff old-muennich muennich/master' to see any differences between the old muennich-rc and the newly updated version.

    (Because I did git fetch instead of git pull or anything else, only muennich/master changed. In particular, master has not changed and is still the same as old-me.)

  4. git rebase --onto muennich/master old-muennich master

    This does all the work (well, I had to resolve and merge some conflicts). What it means is 'take all of the commits that go from old-muennich to master and rebase them on top of muennich/master; afterward, set the end result to be master'.

    (If I omitted the old-muennich argument, I would be trying to rebase both my local changes and the old upstream changes from muennich-rc on top of the current muennich-rc. Depending on the exact changes involved in muennich-rc's rebasing, this could have various conflicts and bad effects (for instance, reintroducing changes that Bert Münnich had decided to discard). There is a common ancestor in the master rc repository, but there could be a lot of changes between there and here.)

    The local changes that I added to the old version of muennich-rc are exactly the commits from old-muennich to master (ie, they're what would be shown by 'git log old-muennich..master', per the git-rebase manpage), so I'm putting my local commits on top of muennich/master. Since the current muennich/master is the top of the just-fetched new version of muennich-rc, I'm putting my local commits on top of the latest upstream rebase. This is exactly what I want to do; I'm rebasing my commits on top of an upstream rebase.

  5. After the dust has settled, I can get rid of the two branches I was using as bookmarks:

    git branch -D old-me
    git branch -D old-muennich
    

    I have to use -D because as far as git is concerned these branches both have unmerged changes. They're unmerged because these branches have both been orphaned by the combination of the muennich-rc rebase and my rebase.

Because I don't care (much) about the old version of my changes that are on top of the old version of muennich-rc, doing a rebase instead of a cherry-pick is the correct option. Following my realization on cherry-picking versus rebasing, there are related scenarios where I might want to cherry-pick instead, for example if I wasn't certain that I liked some of the changes in the rebased muennich-rc and I might want to fall back to the old version. Of course in this situation I could get the same effect by keeping the two branches after the rebase instead of deleting them.

by cks at April 23, 2017 05:03 AM

April 22, 2017

Pantz.org

DNS over TLS. Secure the DNS!

Secure the Web

A while back we had the "secure the web" initiative, where everyone was inspired to enable encryption (https) on their websites. This was so we could thwart things like eavesdropping and content hijacking. In 2016 about about half of website visits where https. This is great and things seem to be only getting better. ISP's can not see the content in https traffic. Not seeing your traffic content anymore makes them sad. What makes them happy? They can still see all of your DNS requests.

Your ISP can see the websites you visit

Every ISP assigns you some of their DNS servers for you to use when you connect to them for your internet connection. Every time you type in a website name in your browser bar, this request goes to their DNS servers to look up an number called an IP address. After this happens an IP address is returned to your computer, and the connection to the website is made. Your ISP now has a log of the website you requested attached to the rest of the information they have about you. Then they build profiles about you, and sell that info to 3rd parties to target advertising to you in many different ways. Think you'll be slick and switch out their DNS servers with someone else's like Google's free DNS servers (8.8.8.8)? Think again. Any request through your ISP to any DNS server on the internet is unencrypted. Your ISP can slurp up all same requests and get the same info they did just like when you were using their DNS servers. Just like when they could see all of your content with http before https. This also means that your DNS traffic could possibly be intercepted and hijacked by other people or entities. TLS prevents this.

Securing the DNS

The thing that secures https is Transport Layer Security (TLS). It is a set of cryptographic protocols that provides communications security over a computer network. Now that we are beginning to secure the websites I think it is high time we secure the DNS. Others seem to agree. In 2016 rfc7858 and rfc8094 was submitted to Internet Engineering Task Force (IETF) which describes the use of DNS over TLS and DNS over DTLS. Hopefully these will eventually become a standard and all DNS traffic will be more secure in transit. Can you have DNS over TLS today? Yes you can!

Trying DNS over TLS now

DNS over TLS is in its infancy currently, but there are ways to try it out now. You could try using Stubby, a program that acts as a local DNS Privacy stub resolver (using DNS-over-TLS). You will have compile Stubby on Linux or OS X to use it. You could also setup your own DNS server at home and point it to some upstream forwarders that support DNS over TLS. This is what I have done to start testing this. I use Unbound as my local DNS server on my lan for all of my client machines. I used the configuration settings provided by our friends over at Calomel.org to setup my Ubound server to use DNS over TLS. Here is a list of other open source DNS software that can use DNS over TLS. With my Unbound setup, all of my DNS traffic is secured from interception and modification. So how is my testing going?

The early days

Since this is not a IETF standard yet, there are not a lot of providers of DNS over TLS resolvers. I have had to rearrange my list of DNS over TLS providers a few times when some of the servers were just not resolving hostnames. The latency is also higher than using your local ISP's DNS servers or using someones like Googles DNS servers. This is not very noticeable since my local DNS server caches the lookups. I have a feeling the generous providers of these DNS over TLS services are being overwhelmed and can not handle the load of the requests. This is where bigger companies come into play.

Places like Google or OpenDNS do not support DNS over TLS yet, but I'm hoping that they will get on board with this. Google especially since they have been big proponents of making all websites https. They also have the infrastructure to pull this off. Even if someone like Google turned this on, that means they get your DNS traffic instead of your ISP. Will this ever end?

Uggh, people can still see my traffic

Let's face it, if your connected to the internet, at some point someone gets to see where your going. You just have to choose who you want/trust to give this information to. If you point your DNS servers to Google they get to see your DNS requests. If I point my DNS at these test DNS over TLS servers then they get to see my DNS traffic. It seems like the lesser of 2 evils to send your DNS to 2nd party DNS servers then to your ISP. If you use your ISP's DNS servers they know the exact name attached to the IP address they assigned you and the customer that is making the query. I have been holding off telling you the bad news. https SNI will still give up the domain names you visit.

Through all of this even if you point your DNS traffic to a DNS over TLS server your ISP can still see many of the sites you go to. This is thanks to something in https called Server Name Indication (SNI). When you make a connection to an https enabled website there is a process called a handshake. This is the exchange of information before the encryption starts. During this unencrypted handshake (ClientHello) one of the things that is sent by you is a remote host name. This allows the server on the other end to choose appropriate certificate based on the requested host name. This happens when multiple virtual hosts reside on the same server, and this a very common setup. Unfortunately, your ISP can see this, slurp it up, and log this to your account/profile. So now what?

Would a VPN help? Yes, but remember now your DNS queries go to your VPN provider. What is nice is your ISP will not see any of your traffic anymore. That pesky SNI issue mentioned above goes away when using a VPN. But now your trusted endpoint is your VPN provider. They now can log all the sites you go to. So choose wisely when picking a VPN provider. Read their policy on saving logs, choose one that will allow you to pay with Bitcoin so you will be anonymous as possible. With a VPN provider you also have to be careful about DNS leaking. If your VPN client is not configured right, or you forget to turn it on or, any other myriad of ways a VPN can fail, your traffic will go right back to your ISP.

Even VPN's don't make you anonymous

So you have encrypted your DNS and web traffic with TLS and your using a VPN. Good for you, now your privacy is a bit better, but not anonymity. Your still being tracked. This time it is AD networks and services you use. I'm not going to go into this as many other people have written on this topic. Just know that your being tracked one way or another.

I know this all seems hopeless, but securing the web's infrasturcture bit by bit helps improve privacy just a little more. DNS like http is unencrypted. There was a big push to get websites to encrypt their data, now there needs to be the same attention given to DNS.

by Pantz.org at April 22, 2017 09:59 PM

April 21, 2017

ma.ttias.be

Canada Just Ruled to Uphold Net Neutrality

The post Canada Just Ruled to Uphold Net Neutrality appeared first on ma.ttias.be.

Who follows their lead?

On Thursday afternoon, the Canadian Radio-television and Telecommunications Commission (CRTC), the country's federal telecom regulator, dropped a bombshell ruling on the status of net neutrality—the principle that all web services should be treated equally by providers.

[...]

The CRTC ruled that "[internet] service providers should treat data traffic equally to foster consumer choice, innovation and the free exchange of ideas".

Source: Canada Just Ruled to Uphold Net Neutrality -- Motherboard

The post Canada Just Ruled to Uphold Net Neutrality appeared first on ma.ttias.be.

by Mattias Geniar at April 21, 2017 09:03 AM

April 20, 2017

ma.ttias.be

Follow-up: MIT no longer owns their /8

The post Follow-up: MIT no longer owns their /8 appeared first on ma.ttias.be.

More details have emerged now with regards to "MIT no longer owns 18.0.0.0/8". Turns out, it's a big move from IPv4 to IPv6 on the MIT network, where the sale of their IP addresses are being used to fund the move to IPv6.

[...]

Fourteen million of these IPv4 addresses have not been used, and we have concluded that at least eight million are excess and can be sold without impacting our current or future needs, up to the point when IPv6 becomes universal and address scarcity is no longer an issue. The Institute holds a block of 20 times 10^30 (20 nonillion) IPv6 addresses.

As part of our upgrade to IPv6, we will be consolidating our in-use IPv4 address space to facilitate the sale of MIT’s excess IPv4 capacity. Net proceeds from the sale will cover our network upgrade costs, and the remainder will provide a source of endowed funding for the Institute to use in furthering its academic and research mission.

Source: Next Generation MITnet

The post Follow-up: MIT no longer owns their /8 appeared first on ma.ttias.be.

by Mattias Geniar at April 20, 2017 11:53 PM

April 19, 2017

That grumpy BSD guy

Forcing the password gropers through a smaller hole with OpenBSD's PF queues

While preparing material for the upcoming BSDCan PF and networking tutorial, I realized that the pop3 gropers were actually not much fun to watch anymore. So I used the traffic shaping features of my OpenBSD firewall to let the miscreants inflict some pain on themselves. Watching logs became fun again.

Yes, in between a number of other things I am currently in the process of creating material for new and hopefully better PF and networking session.

I've been fishing for suggestions for topics to include in the tutorials on relevant mailing lists, and one suggestion that keeps coming up (even though it's actually covered in the existling slides as well as The Book of PF) is using traffic shaping features to punish undesirable activity, such as


What Dan had in mind here may very well end up in the new slides, but in the meantime I will show you how to punish abusers of essentially any service with the tools at hand in your OpenBSD firewall.

Regular readers will know that I'm responsible for maintaining a set of mail services including a pop3 service, and that our site sees pretty much round-the-clock attempts at logging on to that service with user names that come mainly from the local part of the spamtrap addresses that are part of the system to produce our hourly list of greytrapped IP addresses.

But do not let yourself be distracted by this bizarre collection of items that I've maintained and described in earlier columns. The actual useful parts of this article follow - take this as a walkthrough of how to mitigate a wide range of threats and annoyances.

First, analyze the behavior that you want to defend against. In our case that's fairly obvious: We have a service that's getting a volume of unwanted traffic, and looking at our logs the attempts come fairly quickly with a number of repeated attempts from each source address. This similar enough to both the traditional ssh bruteforce attacks and for that matter to Dan's website scenario that we can reuse some of the same techniques in all of the configurations.

I've written about the rapid-fire ssh bruteforce attacks and their mitigation before (and of course it's in The Book of PF) as well as the slower kind where those techniques actually come up short. The traditional approach to ssh bruteforcers has been to simply block their traffic, and the state-tracking features of PF let you set up overload criteria that add the source addresses to the table that holds the addresses you want to block.

I have rules much like the ones in the example in place where there I have a SSH service running, and those bruteforce tables are never totally empty.

For the system that runs our pop3 service, we also have a PF ruleset in place with queues for traffic shaping. For some odd reason that ruleset is fairly close to the HFSC traffic shaper example in The Book of PF, and it contains a queue that I set up mainly as an experiment to annoy spammers (as in, the ones that are already for one reason or the other blacklisted by our spamd).

The queue is defined like this:

   queue spamd parent rootq bandwidth 1K min 0K max 1K qlimit 300

yes, that's right. A queue with a maximum throughput of 1 kilobit per second. I have been warned that this is small enough that the code may be unable to strictly enforce that limit due to the timer resolution in the HFSC code. But that didn't keep me from trying.

And now that I had another group of hosts that I wanted to just be a little evil to, why not let the password gropers and the spammers share the same small patch of bandwidth?

Now a few small additions to the ruleset are needed for the good to put the evil to the task. We start with a table to hold the addresses we want to mess with. Actually, I'll add two, for reasons that will become clear later:

table <longterm> persist counters
table <popflooders> persist counters 

 
The rules that use those tables are:

block drop log (all) quick from <longterm> 


pass in quick log (all) on egress proto tcp from <popflooders> to port pop3 flags S/SA keep state \ 
(max-src-conn 2, max-src-conn-rate 3/3, overload <longterm> flush global, pflow) set queue spamd 

pass in log (all) on egress proto tcp to port pop3 flags S/SA keep state \ 
(max-src-conn 5, max-src-conn-rate 6/3, overload <popflooders> flush global, pflow) 
 
The last one lets anybody connect to the pop3 service, but any one source address can have only open five simultaneous connections and at a rate of six over three seconds.

Any source that trips up one of these restrictions is overloaded into the popflooders table, the flush global part means any existing connections that source has are terminated, and when they get to try again, they will instead match the quick rule that assigns the new traffic to the 1 kilobyte queue.

The quick rule here has even stricter limits on the number of allowed simultaneous connections, and this time any breach will lead to membership of the longterm table and the block drop treatment.

The for the longterm table I already had in place a four week expiry (see man pfctl for detail on how to do that), and I haven't gotten around to deciding what, if any, expiry I will set up for the popflooders.

The results were immediately visible. Monitoring the queues using pfctl -vvsq shows the tiny queue works as expected:

 queue spamd parent rootq bandwidth 1K, max 1K qlimit 300
  [ pkts:     196136  bytes:   12157940  dropped pkts: 398350 bytes: 24692564 ]
  [ qlength: 300/300 ]
  [ measured:     2.0 packets/s, 999.13 b/s ]


and looking at the pop3 daemon's log entries, a typical encounter looks like this:

Apr 19 22:39:33 skapet spop3d[44875]: connect from 111.181.52.216
Apr 19 22:39:33 skapet spop3d[75112]: connect from 111.181.52.216
Apr 19 22:39:34 skapet spop3d[57116]: connect from 111.181.52.216
Apr 19 22:39:34 skapet spop3d[65982]: connect from 111.181.52.216
Apr 19 22:39:34 skapet spop3d[58964]: connect from 111.181.52.216
Apr 19 22:40:34 skapet spop3d[12410]: autologout time elapsed - 111.181.52.216
Apr 19 22:40:34 skapet spop3d[63573]: autologout time elapsed - 111.181.52.216
Apr 19 22:40:34 skapet spop3d[76113]: autologout time elapsed - 111.181.52.216
Apr 19 22:40:34 skapet spop3d[23524]: autologout time elapsed - 111.181.52.216
Apr 19 22:40:34 skapet spop3d[16916]: autologout time elapsed - 111.181.52.216


here the miscreant comes in way too fast and only manages to get five connections going before they're shunted to the tiny queue to fight it out with known spammers for a share of bandwidth.

I've been running with this particular setup since Monday evening around 20:00 CEST, and by late Wednesday evening the number of entries in the popflooders table had reached approximately 300.

I will decide on an expiry policy at some point, I promise. In fact, I welcome your input on what the expiry period should be.

One important takeaway from this, and possibly the most important point of this article, is that it does not take a lot of imagination to retool this setup to watch for and protect against undesirable activity directed at essentially any network service.

You pick the service and the ports it uses, then figure out what are the parameters that determine what is acceptable behavior. Once you have those parameters defined, you can choose to assign to a minimal queue like in this example, block outright, redirect to something unpleasant or even pass with a low probability.

All of those possibilities are part of the normal pf.conf toolset on your OpenBSD system. If you want, you can supplement these mechanisms with a bit of log file parsing that produces output suitable for feeding to pfctl to add to the table of miscreants. The only limits are, as always, the limits of your imagination (and possibly your programming abilities).

FreeBSD users will be pleased to know that something similar is possible on their systems too, only substituting the legacy ALTQ traffic shaping with its somewhat arcane syntax for the modern queues rules in this article.

Will you be attending our PF and networking session in Ottawa, or will you want to attend one elsewhere later? Please let us know at the email address in the tutorial description.



Update 2017-04-23: A truly unexpiring table, and downloadable datasets made available

Soon after publishing this article I realized that what I had written could easily be taken as a promise to keep a collection of POP3 gropers' IP addresses around indefinitely, in a table where the entries never expire.

Table entries do not expire unless you use a pfctl(8) command like the ones mentioned in the book and other resources I referenced earlier in the article, but on the other hand table entries will not survive a reboot either unless you arrange to have table contents stored to somewhere more permanent and restored from there. Fortunately our favorite toolset has a feature that implements at least the restoring part.

Changing the table definition quoted earler to read

 table <popflooders> persist counters file "/var/tmp/popflooders"

takes part of the restoring, and the backing up is a matter of setting up a cron(8) job to dump current contents of the table to the file that will be loaded into the table at ruleset load.

Then today I made another tiny change and made the data available for download. The popflooders table is dumped at five past every full hour to pop3gropers.txt, a file desiged to be read by anything that takes a list of IP addresses and ignores lines starting with the # comment character. I am sure you can think of suitable applications.

In addition, the same script does a verbose dump, including table statistiscs for each entry, to pop3gropers_full.txt for readers who are interested in such things as when an entry was created and how much traffic those hosts produced, keeping in mind that those hosts are not actually blocked here, only subjected to a tiny bandwidth.

As it says in the comment at the top of both files, you may use the data as you please for your own purposes, for any re-publishing or integration into other data sets please contact me via the means listed in the bsdly.net whois record.
As usual I will answer any reasonable requests for further data such as log files, but do not expect prompt service and keep in mind that I am usually in the Central European time zone (CEST at the moment).

I suppose we should see this as a tiny, incremental evolution of the "Cybercrime Robot Torture As A Service" (CRTAAS) concept.

by Peter N. M. Hansteen (noreply@blogger.com) at April 19, 2017 09:45 PM

Steve Kemp's Blog

3d-Printing is cool

I've heard about 3d-printing a lot in the past, although the hype seems to have mostly died down. My view has always been "That seems cool", coupled with "Everybody says making the models is very hard", and "the process itself is fiddly & time-consuming".

I've been sporadically working on a project for a few months now which displays tram-departure times, this is part of my drive to "hardware" things with Arduino/ESP8266 devices . Most visitors to our flat have commented on it, at least once, and over time it has become gradually more and more user-friendly. Initially it was just a toy-project for myself, so everything was hard-coded in the source but over time that changed - which I mentioned here, (specifically the Access-point setup):

  • When it boots up, unconfigured, it starts as an access-point.
    • So you can connect and configure the WiFi network it should join.
  • Once it's up and running you can point a web-browser at it.
    • This lets you toggle the backlight, change the timezone, and the tram-stop.
    • These values are persisted to flash so reboots will remember everything.

I've now wired up an input-button to the device too, experimenting with the different ways that a single button can carry out multiple actions:

  • Press & release - toggle the backlight.
  • Press & release twice - a double-click if you like - show a message.
  • Press, hold for 1 second, then release - re-sync the date/time & tram-data.

Anyway the software is neat, and I can't think of anything obvious to change. So lets move onto the real topic of this post: 3D Printing.

I randomly remembered that I'd heard about an online site holding 3D-models, and on a whim I searched for "4x20 LCD". That lead me to this design, which is exactly what I was looking for. Just like open-source software we're now living in a world where you can get open-source hardware! How cool is that?

I had to trust the dimensions of the model, and obviously I was going to mount my new button into the box, rather than the knob shown. But having a model was great. I could download it, for free, and I could view it online at viewstl.com.

But with a model obtained the next step was getting it printed. I found a bunch of commercial companies, here in Europe, who would print a model, and ship it to me, but when I uploaded the model they priced it at €90+. Too much. I'd almost lost interest when I stumbled across a site which provides a gateway into a series of individual/companies who will print things for you, on-demand: 3dhubs.

Once again I uploaded my model, and this time I was able to select a guy in the same city as me. He printed my model for 1/3-1/4 of the price of the companies I'd found, and sent me fun pictures of the object while it was in the process of being printed.

To recap I started like this:

Then I boxed it in cardboard which looked better than nothing, but still not terribly great:

Now I've found an online case-design for free, got it printed cheaply by a volunteer (feels like the wrong word, after-all I did pay him), and I have something which look significantly more professional:

Inside it looks as neat as you would expect:

Of course the case still cost 5 times as much as the actual hardware involved (button: €0.05, processor-board €2.00 and LCD I2C display €3.00). But I've gone from being somebody who had zero experience with hardware-based projects 4 months ago, to somebody who has built a project which is functional and "pretty".

The internet really is a glorious thing. Using it for learning, and coding is good, using it for building actual physical parts too? That's something I never could have predicted a few years ago and I can see myself doing it more in the future.

Sure the case is a little rough around the edges, but I suspect it is now only a matter of time until I learn how to design my own models. An obvious extension is to add a status-LED above the switch, for example. How hard can it be to add a new hole to a model? (Hell I could just drill it!)

April 19, 2017 09:00 PM

April 18, 2017

Errata Security

Mirai, Bitcoin, and numeracy

Newsweek (the magazine famous for outing the real Satoshi Nakamoto) has a story about how a variant of the Mirai botnet is mining bitcoin. They fail to run the numbers.

The story repeats a claim by Mcafee that 2.5 million devices were infected with Mirai at some point in 2016. If they were all mining bitcoin, how much money would the hackers be earning?

I bought security cameras and infected them with Mirai. A typical example of the CPU running on an IoT device is an ARM926EJ-S processor.


As this website reports, such a processor running at 1.2 GHz can mine at a rate of 0.187-megahashes/second. That's a bit fast for an IoT device, most are slower, some are faster, we'll just use this as the average.


According to this website, the current hash-rate of all minters is around 4-million terahashes/second.


Bitcoin blocks are mined every 10 minutes, with the current (April 2017) reward set at 12.5 bitcoins per block, giving roughly 1800 bitcoins/day in reward.

The current price of bitcoin is $1191.



Okay, let's plug all these numbers in:
  •  total Mirai hash-rate = 2.5 million bots times 0.185 megahash/sec = 0.468 terahashes/second
  •  daily Bitcoin earnings = $1191 times 1800 = $2.1 million/day
  •  daily Mirai earnings = (0.468 divided by 4-million) times $2.1 million = $0.25
In other words, if the entire Mirai botnet of 2.5 million IoT devices was furiously mining bitcoin, it's total earnings would be $0.25 (25 cents) per day.

Conclusion

If 2.5 million IoT devices mine Bitcoin, they'd earn in total 25 pennies per day. It's inconceivable that anybody would add bitcoin mining to the Mirai botnet other than as a joke.




Bonus: A single 5 kilogram device you hold in your hand can mind at 12.5 terahashes/second, or 25 times the hypothetical botnet, for $1200.

by Robert Graham (noreply@blogger.com) at April 18, 2017 06:49 PM

April 15, 2017

[bc-log]

Using stunnel and TinyProxy to obfuscate HTTP traffic

Recently there has been a lot of coverage in both tech and non-tech news outlets about internet privacy and how to prevent snooping both from service providers and governments. In this article I am going to show one method of anonymizing internet traffic; using a TLS enabled HTTP/HTTPS Proxy.

In this article we will walk through using stunnel to create a TLS tunnel with an instance of TinyProxy on the other side.

How does this help anonymize internet traffic

TinyProxy is an HTTP & HTTPS proxy server. By setting up a TinyProxy instance on a remote server and configuring our HTTP client to use this proxy. We can route all of our HTTP & HTTPS traffic through that remote server. This is a useful technique for getting around network restrictions that might be imposed by ISP's or Governments.

However, it's not enough to simply route HTTP/HTTPS traffic to a remote server. This in itself does not add any additional protection to the traffic. In fact, with an out of the box TinyProxy setup, all of the HTTP traffic to TinyProxy would still be unencrypted, leaving it open to packet capture and inspection. This is where stunnel comes into play.

I've featured it in earlier articles but for those who are new to stunnel, stunnel is a proxy that allows you to create a TLS tunnel between two or more systems. In this article we will use stunnel to create a TLS tunnel between the HTTP client system and TinyProxy.

By using a TLS tunnel between the HTTP client and TinyProxy our HTTP traffic will be encrypted between the local system and the proxy server. This means anyone trying to inspect HTTP traffic will be unable to see the contents of our HTTP traffic.

This technique is also useful for reducing the chances of a man-in-the-middle attack to HTTPS sites.

I say reducing because one of the caveats of this article is, while routing our HTTP & HTTPS traffic through a TLS tunneled HTTP proxy will help obfuscate and anonymize our traffic. The system running TinyProxy is still susceptible to man-in-the-middle attacks and HTTP traffic snooping.

Essentially, with this article, we are not focused on solving the problem of unencrypted traffic, we are simply moving our problem to a network where no one is looking. This is essentially the same approach as VPN service providers, the advantage of running your own proxy is that you control the proxy.

Now that we understand the end goal, let's go ahead and get started with the installation of TinyProxy.

Installing TinyProxy

The installation of TinyProxy is fairly easy and can be accomplished using the apt-get command on Ubuntu systems. Let's go ahead and install TinyProxy on our future proxy server.

server: $ sudo apt-get install tinyproxy

Once the apt-get command finishes, we can move to configuring TinyProxy.

Configuring TinyProxy

By default TinyProxy starts up listening on all interfaces for a connection to port 8888. Since we don't want to leave our proxy open to the world, let's change this by configuring TinyProxy to listen to the localhost interface only. We can do this by modifying the Listen parameter within the /etc/tinyproxy.conf file.

Find:

#Listen 192.168.0.1

Replace With:

Listen 127.0.0.1

Once complete, we will need to restart the TinyProxy service in order for our change to take effect. We can do this using the systemctl command.

server: $ sudo systemctl restart tinyproxy

After systemctl completes, we can validate that our change is in place by checking whether port 8888 is bound correctly using the netstat command.

server: $ netstat -na
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 127.0.0.1:8888          0.0.0.0:*               LISTEN

Based on the netstat output it appears TinyProxy is setup correctly. With that done, let's go ahead and setup stunnel.

Installing stunnel

Just like TinyProxy, the installation of stunnel is as easy as executing the apt-get command.

server: $ sudo apt-get install stunnel

Once apt-get has finished we will need to enable stunnel by editing the /etc/default/stunnel4 configuration file.

Find:

# Change to one to enable stunnel automatic startup
ENABLED=0

Replace:

# Change to one to enable stunnel automatic startup
ENABLED=1

By default on Ubuntu, stunnel is installed in a disabled mode. By changing the ENABLED flag from 0 to 1 within /etc/default/stunnel4, we are enabling stunnel to start automatically. However, our configuration of stunnel does not stop there.

Our next step with stunnel, will involve defining our TLS tunnel.

TLS Tunnel Configuration (Server)

By default stunnel will look in /etc/stunnel for any files that end in .conf. In order to configure our TLS stunnel we will be creating a file named /etc/stunnel/stunnel.conf. Once created, we will insert the following content.

[tinyproxy]
accept = 0.0.0.0:3128
connect = 127.0.0.1:8888
cert = /etc/ssl/cert.pem
key = /etc/ssl/key.pem

The contents of this configuration file are fairly straight forward, but let's go ahead and break down what each of these items mean. We will start with the accept option.

The accept option is similar to the listen option from TinyProxy. This setting will define what interface and port stunnel will listen to for incoming connections. By setting this to 0.0.0.0:3128 we are defining that stunnel should listen on all interfaces on port 3128.

The connect option is used to tell stunnel what IP and port to connect to. In our case this needs to be the IP and port that TinyProxy is listening on; 127.0.0.1:8888.

An easy way to remember how accept and connect should be configured is that accept is where incoming connections should come from, and connect is where they should go to.

Our next two configuration options are closely related, cert & key. The cert option is used to define the location of an SSL certificate that will be used to establish our TLS session. The key option is used to define the location of the key used to create the SSL certificate.

We will set these to be located in /etc/ssl and in the next step, we will go ahead and create both the key and certificate.

Creating a self-signed certificate

The first step in creating a self-signed certificate is to create an private key. To do this we will use the following openssl command.

server: $ sudo openssl genrsa -out /etc/ssl/key.pem 4096

The above will create a 4096 bit RSA key. From this key, we will create a public certificate using another openssl command. During the execution of this command there will be a series of questions. These questions are used to populate the key with organization information. Since we will be using this for our own purposes we will not worry too much about the answers to these questions.

server: $ sudo openssl req -new -x509 -key /etc/ssl/key.pem -out /etc/ssl/cert.pem -days 1826
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:US
State or Province Name (full name) [Some-State]:Arizona
Locality Name (eg, city) []:Phoenix
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Example.com
Organizational Unit Name (eg, section) []:
Common Name (e.g. server FQDN or YOUR name) []:proxy.example.com
Email Address []:proxy@example.com

Once the questions have been answered the openssl command will create our certificate file.

After both the certificate and key have been created, we will need to restart stunnel in order for our changes to take effect.

server: $ sudo systemctl restart stunnel4

At this point we have finished configuring the proxy server. We now need to setup our client.

Setting up stunnel (client)

Like the proxy server, our first step in setting up stunnel on our client is installing it with the apt-get command.

client: $ sudo apt-get install stunnel

We will also once again need to enabling stunnel within the /etc/default/stunnel4 configuration file.

Find:

# Change to one to enable stunnel automatic startup
ENABLED=0

Replace:

# Change to one to enable stunnel automatic startup
ENABLED=1

After enabling stunnel we will need to restart the service with the systemctl command.

client: $ sudo systemctl restart stunnel4

From here we can move on to configuring the stunnel client.

TLS Tunnel Configuration (Client)

The configuration of stunnel in "client-mode" is a little different than the "server-mode" configuration we set earlier. Let's go ahead and add our client configuration into the /etc/stunnel/stunnel.conf file.

client = yes

[tinyproxy]
accept = 127.0.0.1:3128
connect = 192.168.33.10:3128
verify = 4
CAFile = /etc/ssl/cert.pem

As we did before, let's break down the configuration options shown above.

The first option is client, this option is simple, as it defines whether stunnel should be operating in a client or server mode. By setting this to yes, we are defining that we would like to use client mode.

We covered accept and connect before and if we go back to our description above we can see that stunnel will accept connections on 127.0.0.1:3128 and then tunnel them to 192.168.33.10:3128, which is the IP and port that our stunnel proxy server is listening on.

The verify option is used to define what level of certificate validation should be performed. The option of 4 will cause stunnel to verify the remote certificate with a local certificate defined with the CAFile option. In the above example, I copied the /etc/ssl/cert.pem we generated on the server to the client and set this as the CAFile.

These last two options are important, without setting verify and CAFile stunnel will open an TLS connection without necessarily checking the validity of the certificate. By setting verify to 4 and CAFile to the same cert.pem we generated earlier, we are giving stunnel a way to validate the identity of our proxy server. This will prevent our client from being hit with a man-in-the-middle attack.

Once again, let's restart stunnel to make our configurations take effect.

client: $ sudo systemctl restart stunnel4

With our configurations complete, let's go ahead and test our proxy.

Testing our TLS tunneled HTTP Proxy

In order to test our proxy settings we will use the curl command. While we are using a command line web client in this article, it is possible to use this same type of configuration with GUI based browsers such as Chrome or Firefox.

Before testing however, I will need to set the http_proxy and https_proxy environmental variables. These will tell curl to leverage our proxy server.

client: $ export http_proxy="http://localhost:3128"
client: $ export https_proxy="https://localhost:3128"

With our proxy server settings in place, let's go ahead and execute our curl command.

client: $ curl -v https://google.com
* Rebuilt URL to: https://google.com/
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 3128 (#0)
* Establish HTTP proxy tunnel to google.com:443
> CONNECT google.com:443 HTTP/1.1
> Host: google.com:443
> User-Agent: curl/7.47.0
> Proxy-Connection: Keep-Alive
>
< HTTP/1.0 200 Connection established
< Proxy-agent: tinyproxy/1.8.3
<
* Proxy replied OK to CONNECT request
* found 173 certificates in /etc/ssl/certs/ca-certificates.crt
* found 692 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_ECDSA_AES_128_GCM_SHA256
*    server certificate verification OK
*    server certificate status verification SKIPPED
*    common name: *.google.com (matched)
*    server certificate expiration date OK
*    server certificate activation date OK
*    certificate public key: EC
*    certificate version: #3
*    subject: C=US,ST=California,L=Mountain View,O=Google Inc,CN=*.google.com
*    start date: Wed, 05 Apr 2017 17:47:49 GMT
*    expire date: Wed, 28 Jun 2017 16:57:00 GMT
*    issuer: C=US,O=Google Inc,CN=Google Internet Authority G2
*    compression: NULL
* ALPN, server accepted to use http/1.1
> GET / HTTP/1.1
> Host: google.com
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Location: https://www.google.com/
< Content-Type: text/html; charset=UTF-8
< Date: Fri, 14 Apr 2017 22:37:01 GMT
< Expires: Sun, 14 May 2017 22:37:01 GMT
< Cache-Control: public, max-age=2592000
< Server: gws
< Content-Length: 220
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
< Alt-Svc: quic=":443"; ma=2592000; v="37,36,35"
<
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>
* Connection #0 to host localhost left intact

From the above output we can see that our connection was routed through TinyProxy.

< Proxy-agent: tinyproxy/1.8.3

And we were able to connect to Google.

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>

Given these results, it seems we have a working TLS based HTTP/HTTPS proxy. Since this proxy is exposed on the internet what would happen if this proxy was found by someone simply scanning subnets for nefarious purposes.

Securing our tunnel further with PreShared Keys

In theory as it stands today they could use our proxy for their own purposes. This means we need some way to ensure that only our client can use this proxy; enter PreShared Keys.

Much like an API token, stunnel supports an authentication method called PSK or PreShared Keys. This is essentially what it sounds like. A token that has been shared between the client and the server in advance and used for authentication. To enable PSK authentication we simply need to add the following two lines to the /etc/stunnel/stunnel.conf file on both the client and the server.

ciphers = PSK
PSKsecrets = /etc/stunnel/secrets

By setting ciphers to PSK we are telling stunnel to use PSK based authentication. The PSKsecrets option is used to provide stunnel a file that contains the secrets in a clientname:token format.

In the above we specified the /etc/stunnel/secrets file. Let's go ahead and create that file and enter a sample PreShared Key.

client1:SjolX5zBNedxvhj+cQUjfZX2RVgy7ZXGtk9SEgH6Vai3b8xiDL0ujg8mVI2aGNCz

Since the /etc/stunnel/secrets file contains sensitive information. Let's ensure that the permissions on the file are set appropriately.

$ sudo chmod 600 /etc/stunnel/secrets

By setting the permissions to 600 we are ensuring only the root user (the owner of the file) can read this file. These permissions will prevent other users from accessing the file and stumbling across our authentication credentials.

After setting permissions we will need to once again restart the stunnel service.

$ sudo systemctl restart stunnel4

With our settings are complete on both the client and server, we now have a reasonably secure TLS Proxy service.

Summary

In this article we covered setting up both a TLS and HTTP Proxy. As I mentioned before this setup can be used to help hide HTTP and HTTPS traffic on a given network. However, it is important to remember that while this setup makes the client system a much more tricky target, the proxy server itself could still be targeted for packet sniffing and man-in-the-middle attacks.

The key is to wisely select the location of where the proxy server is hosted.

Have feedback on the article? Want to add your 2 cents? Shoot a tweet out about it and tag @madflojo.


Posted by Benjamin Cane

April 15, 2017 07:30 AM

April 13, 2017

Feeding the Cloud

Automatically renewing Let's Encrypt TLS certificates on Debian using Certbot

I use Let's Encrypt TLS certificates on my Debian servers along with the Certbot tool. Since I use the "temporary webserver" method of proving domain ownership via the ACME protocol, I cannot use the cert renewal cronjob built into Certbot.

Instead, this is the script I put in /etc/cron.daily/certbot-renew:

#!/bin/bash

/usr/bin/certbot renew --quiet --pre-hook "/bin/systemctl stop apache2.service" --post-hook "/bin/systemctl start apache2.service"

pushd /etc/ > /dev/null
/usr/bin/git add letsencrypt
DIFFSTAT="$(/usr/bin/git diff --cached --stat)"
if [ -n "$DIFFSTAT" ] ; then
    /usr/bin/git commit --quiet -m "Renewed letsencrypt certs"
    echo "$DIFFSTAT"
fi
popd > /dev/null

It temporarily disables my Apache webserver while it renews the certificates and then only outputs something to STDOUT (since my cronjob will email me any output) if certs have been renewed.

Since I'm using etckeeper to keep track of config changes on my servers, my renewal script also commits to the repository if any certs have changed.

External Monitoring

In order to catch mistakes or oversights, I use ssl-cert-check to monitor my domains once a day:

ssl-cert-check -s fmarier.org -p 443 -q -a -e francois@fmarier.org

I also signed up with Cert Spotter which watches the Certificate Transparency log and notifies me of any newly-issued certificates for my domains.

In other words, I get notified:

  • if my cronjob fails and a cert is about to expire, or
  • as soon as a new cert is issued.

The whole thing seems to work well, but if there's anything I could be doing better, feel free to leave a comment!

April 13, 2017 03:00 PM

Raymii.org

Distributed load testing with Tsung

At $dayjob I manage a large OpenStack Cloud. Next to that I also build high-performance and redundant clusters for customers. Think multiple datacenters, haproxy, galera or postgres or mysql replication, drbd with nfs or glusterfs and all sorts of software that can (and sometimes cannot) be clustered (redis, rabbitmq etc.). Our customers deploy their application on there and when one or a few components fail, their application stays up. Hypervisors, disks, switches, routers, all can fail without actual service downtime. Next to building such clusters, we also monitor and manage them. When we build such a cluster (fully automated with Ansible) we do a basic load test. We do this not for benchmarking or application flow testing, but to optimize the cluster components. Simple things like the mpm workers or threads in Apache or more advanced topics like MySQL or DRBD. Optimization there depends on the specifications of the servers used and the load patterns. Tsung is a high-performance but simple to configure and use piece of software written in Erlang. Configuration is done in a simple readable XML file. Tsung can be run distributed as well for large setups. It has good reporting and a live web interface for status and reports during a test.

April 13, 2017 12:00 AM

April 12, 2017

Vincent Bernat

Proper isolation of a Linux bridge

TL;DR: when configuring a Linux bridge, use the following commands to enforce isolation:

# bridge vlan del dev br0 vid 1 self
# echo 1 > /sys/class/net/br0/bridge/vlan_filtering

A network bridge (also commonly called a “switch”) brings several Ethernet segments together. It is a common element in most infrastructures. Linux provides its own implementation.

A typical use of a Linux bridge is shown below. The hypervisor is running three virtual hosts. Each virtual host is attached to the br0 bridge (represented by the horizontal segment). The hypervisor has two physical network interfaces:

  • eth0 is attached to a public network providing various services for the virtual hosts (DHCP, DNS, NTP, routers to Internet, …). It is also part of the br0 bridge.
  • eth1 is attached to an infrastructure network providing various services to the hypervisor (DNS, NTP, configuration management, routers to Internet, …). It is not part of the br0 bridge.

Typical use of Linux bridging with virtual machines

The main expectation of such a setup is that while the virtual hosts should be able to use resources from the public network, they should not be able to access resources from the infrastructure network (including resources hosted on the hypervisor itself, like a SSH server). In other words, we expect a total isolation between the green domain and the purple one.

That’s not the case. From any virtual host:

# ip route add 192.168.14.3/32 dev eth0
# ping -c 3 192.168.14.3
PING 192.168.14.3 (192.168.14.3) 56(84) bytes of data.
64 bytes from 192.168.14.3: icmp_seq=1 ttl=59 time=0.644 ms
64 bytes from 192.168.14.3: icmp_seq=2 ttl=59 time=0.829 ms
64 bytes from 192.168.14.3: icmp_seq=3 ttl=59 time=0.894 ms

--- 192.168.14.3 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2033ms
rtt min/avg/max/mdev = 0.644/0.789/0.894/0.105 ms

Why?

There are two main factors of this behavior:

  1. A bridge can accept IP traffic. This is a useful feature if you want Linux to act as a bridge and provide some IP services to bridge users (a DHCP relay or a default gateway). This is usually done by configuring the IP address on the bridge device: ip addr add 192.0.2.2/25 dev br0.
  2. An interface doesn’t need an IP address to process incoming IP traffic. Additionally, by default, Linux accepts to answer ARP requests independently from the incoming interface.

Bridge processing

After turning an incoming Ethernet frame into a socket buffer, the network driver transfers the buffer to the netif_receive_skb() function. The following actions are executed:

  1. copy the frame to any registered global or per-device taps (e.g. tcpdump),
  2. evaluate the ingress policy (configured with tc),
  3. hand over the frame to the device-specific receive handler, if any,
  4. hand over the frame to a global or device-specific protocol handler (e.g. IPv4, ARP, IPv6).

For a bridged interface, the kernel has configured a device-specific receive handler, br_handle_frame(). This function won’t allow any additional processing in the context of the incoming interface, except for STP and LLDP frames or if “brouting” is enabled1. Therefore, the protocol handlers are never executed in this case.

After a few additional checks, Linux will decide if the frame has to be locally delivered:

  • the entry for the target MAC in the FDB is marked for local delivery, or
  • the target MAC is a broadcast or a multicast address.

In this case, the frame is passed to the br_pass_frame_up() function. A VLAN-related check is optionally performed. The socket buffer is attached to the bridge interface (br0) instead of the physical interface (eth0), is evaluated by Netfilter and sent back to netif_receive_skb(). It will go through the four steps a second time.

IPv4 processing

When a device doesn’t have a protocol-independent receive handler, a protocol-specific handler will be used:

# cat /proc/net/ptype
Type Device      Function
0800          ip_rcv
0011          llc_rcv [llc]
0004          llc_rcv [llc]
0806          arp_rcv
86dd          ipv6_rcv

Therefore, if the Ethernet type of the incoming frame is 0x800, the socket buffer is handled by ip_rcv(). Among other things, the three following steps will happen:

  • If the frame destination address is not the MAC address of the incoming interface, not a multicast one and not a broadcast one, the frame is dropped (“not for us”).
  • Netfilter gets a chance to evaluate the packet (in a PREROUTING chain).
  • The routing subsystem will decide the destination of the packet in ip_route_input_slow(): is it a local packet, should it be forwarded, should it be dropped, should it be encapsulated? Notably, the reverse-path filtering is done during this evaluation in fib_validate_source().

Reverse-path filtering (also known as uRPF, or unicast reverse-path forwarding, RFC 3704) enables Linux to reject traffic on interfaces which it should never have originated: the source address is looked up in the routing tables and if the outgoing interface is different from the current incoming one, the packet is rejected.

ARP processing

When the Ethernet type of the incoming frame is 0x806, the socket buffer is handled by arp_rcv().

  • Like for IPv4, if the frame is not for us, it is dropped.
  • If the incoming device has the NOARP flag, the frame is dropped.
  • Netfilter gets a chance to evaluate the packet (configuration is done with arptables).
  • For an ARP request, the values of arp_ignore and arp_filter may trigger a drop of the packet.

IPv6 processing

When the Ethernet type of the incoming frame is 0x86dd, the socket buffer is handled by ipv6_rcv().

  • Like for IPv4, if the frame is not for us, it is dropped.
  • If IPv6 is disabled on the interface, the packet is dropped.
  • Netfilter gets a chance to evaluate the packet (in a PREROUTING chain).
  • The routing subsystem will decide the destination of the packet. However, unlike IPv4, there is no reverse-path filtering2.

Workarounds

There are various methods to fix the situation.

We can completely ignore the bridged interfaces: as long as they are attached to the bridge, they cannot process any upper layer protocol (IPv4, IPv6, ARP). Therefore, we can focus on filtering incoming traffic from br0.

It should be noted that for IPv4, IPv6 and ARP protocols, the MAC address check can be circumvented by using the broadcast MAC address.

Protocol-independent workarounds

The four following fixes will indistinctly drop IPv4, ARP and IPv6 packets.

Using VLAN-aware bridge

Linux 3.9 introduced the ability to use VLAN filtering on bridge ports. This can be used to prevent any local traffic:

# echo 1 > /sys/class/net/br0/bridge/vlan_filtering
# bridge vlan del dev br0 vid 1 self
# bridge vlan show
port    vlan ids
eth0     1 PVID Egress Untagged
eth2     1 PVID Egress Untagged
eth3     1 PVID Egress Untagged
eth4     1 PVID Egress Untagged
br0     None

This is the most efficient method since the frame is dropped directly in br_pass_frame_up().

Using ingress policy

It’s also possible to drop the bridged frame early after it has been re-delivered to netif_receive_skb() by br_pass_frame_up(). The ingress policy of an interface is evaluated before any handler. Therefore, the following commands will ensure no local delivery (the source interface of the packet is the bridge interface) happens:

# tc qdisc add dev br0 handle ffff: ingress
# tc filter add dev br0 parent ffff: u32 match u8 0 0 action drop

In my opinion, this is the second most efficient method.

Using ebtables

Just before re-delivering the frame to netif_receive_skb(), Netfilter gets a chance to issue a decision. It’s easy to configure it to drop the frame:

# ebtables -A INPUT --logical-in br0 -j DROP

However, to the best of my knowledge, this part of Netfilter is known to be inefficient.

Using namespaces

Isolation can also be obtained by moving all the bridged interfaces into a dedicated network namespace and configure the bridge inside this namespace:

# ip netns add bridge0
# ip link set netns bridge0 eth0
# ip link set netns bridge0 eth2
# ip link set netns bridge0 eth3
# ip link set netns bridge0 eth4
# ip link del dev br0
# ip netns exec bridge0 brctl addbr br0
# for i in 0 2 3 4; do
>    ip netns exec bridge0 brctl addif br0 eth$i
>    ip netns exec bridge0 ip link set up dev eth$i
> done
# ip netns exec bridge0 ip link set up dev br0

The frame will still wander a bit inside the IP stack, wasting some CPU cycles and increasing the possible attack surface. But ultimately, it will be dropped.

Protocol-dependent workarounds

Unless you require multiple layers of security, if one of the previous workarounds is already applied, there is no need to apply one of the protocol-dependent fix below. It’s still interesting to know them because it is not uncommon to already have them in place.

ARP

The easiest way to disable ARP processing on a bridge is to set the NOARP flag on the device. The ARP packet will be dropped as the very first step of the ARP handler.

# ip link set arp off dev br0
# ip l l dev br0
8: br0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 50:54:33:00:00:04 brd ff:ff:ff:ff:ff:ff

arptables can also drop the packet quite early:

# arptables -A INPUT -i br0 -j DROP

Another way is to set arp_ignore to 2 for the given interface. The kernel will only answer to ARP requests whose target IP address is configured on the incoming interface. Since the bridge interface doesn’t have any IP address, no ARP requests will be answered.

# sysctl -qw net.ipv4.conf.br0.arp_ignore=2

Disabling ARP processing is not a sufficient workaround for IPv4. A user can still insert the appropriate entry in its neighbor cache:

# ip neigh replace 192.168.14.3 lladdr 50:54:33:00:00:04 dev eth0
# ping -c 1 192.168.14.3
PING 192.168.14.3 (192.168.14.3) 56(84) bytes of data.
64 bytes from 192.168.14.3: icmp_seq=1 ttl=49 time=1.30 ms

--- 192.168.14.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.309/1.309/1.309/0.000 ms

As the check on the target MAC address is quite loose, they don’t even need to guess the MAC address:

# ip neigh replace 192.168.14.3 lladdr ff:ff:ff:ff:ff:ff dev eth0
# ping -c 1 192.168.14.3
PING 192.168.14.3 (192.168.14.3) 56(84) bytes of data.
64 bytes from 192.168.14.3: icmp_seq=1 ttl=49 time=1.12 ms

--- 192.168.14.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.129/1.129/1.129/0.000 ms

IPv4

The earliest place to drop an IPv4 packet is with Netfilter3:

# iptables -t raw -I PREROUTING -i br0 -j DROP

If Netfilter is disabled, another possibility is to enable strict reverse-path filtering for the interface. In this case, since there is no IP address configured on the interface, the packet will be dropped during the route lookup:

# sysctl -qw net.ipv4.conf.br0.rp_filter=1

Another option is the use of a dedicated routing rule. Compared to the reverse-path filtering option, the packet will be dropped a bit earlier, still during the route lookup.

# ip rule add iif br0 blackhole

IPv6

Linux provides a way to completely disable IPv6 on a given interface. The packet will be dropped as the very first step of the IPv6 handler:

# sysctl -qw net.ipv6.conf.br0.disable_ipv6=1

Like for IPv4, it’s possible to use Netfilter or a dedicated routing rule.

About the example

In the above example, the virtual host get ICMP replies because they are routed through the infrastructure network to Internet (e.g. the hypervisor has a default gateway which also acts as a NAT router to Internet). This may not be the case.

If you want to check if you are “vulnerable” despite not getting an ICMP reply, look at the guest neighbor table to check if you got an ARP reply from the host:

# ip route add 192.168.14.3/32 dev eth0
# ip neigh show dev eth0
192.168.14.3 lladdr 50:54:33:00:00:04 REACHABLE

If you didn’t get a reply, you could still have issues with IP processing. Add a static neighbor entry before checking the next step:

# ip neigh replace 192.168.14.3 lladdr ff:ff:ff:ff:ff:ff dev eth0

To check if IP processing is enabled, check the bridge host’s network statistics:

# netstat -s | grep "ICMP messages"
    15 ICMP messages received
    15 ICMP messages sent
    0 ICMP messages failed

If the counters are increasing, it is processing incoming IP packets.

One-way communication still allows a lot of bad things, like DoS attacks. Additionally, if the hypervisor happens to also act as a router, the reach is extended to the whole infrastructure network, potentially exposing weak devices (e.g. PDU) exposing an SNMP agent. If one-way communication is all that’s needed, the attacker can also spoof its source IP address, bypassing IP-based authentication.


  1. A frame can be forcibly routed (L3) instead of bridged (L2) by “brouting” the packet. This action can be triggered using ebtables

  2. For IPv6, reverse-path filtering needs to be implemented with Netfilter, using the rpfilter match

  3. If the br_netfilter module is loaded, net.bridge.bridge-nf-call-ipatbles sysctl has to be set to 0. Otherwise, you also need to use the physdev match to not drop IPv4 packets going through the bridge. 

by Vincent Bernat at April 12, 2017 07:58 AM

April 11, 2017

Everything Sysadmin

DNS as Code

StackOverflow has open sourced the DNS management system we've been using for years. I was the original author, and Craig Peterson has been writing most of the code lately. A blog post went live about the system here: Blog post: Introducing DnsControl - "DNS as Code" has Arrived

My favorite part of DNSControl? We can switch between DNS providers faster than cyberthreads can take them down.

Check it out if you manage DNS zones, use CloudFlare or Fast.ly CDNs, or just like to read Go (golang) code.

Tom

by Tom Limoncelli at April 11, 2017 07:56 PM

syslog.me

Exploring Docker overlay networks

Docker In the past months I have made several attempts to explore Docker overlay networks, but there were a few pieces to set up before I could really experiment and… well, let’s say that I have probably approached the problem the wrong way and wasted some time along the way. Not again. I have set aside some time and worked agile enough to do the whole job, from start to finish. Nowadays there is little point in creating overlay networks by hand, except that it’s still a good learning experience. And a learning experience with Docker and networking was exactly what I was after.

When I started exploring multi-host Docker networks, Docker was quite different than it is now. In particular, Docker Swarm didn’t exist yet, and there was a certain amount of manual work required in order to create an overlay network, so that containers located in different hosts can communicate.

Before Swarm, in order to set up an overlay network one needed to:

  • have at least two docker hosts to establish an overlay network;
  • have a supported key/value store available for the docker hosts to sync information;
  • configure the docker hosts to use the key/value store;
  • create an overlay network on one of the docker host; if everything worked well, the network will “propagate” to the other docker hosts that had registered with the key/value store;
  • create named containers on different hosts; then try and ping each other using the names: if everything was done correctly, you would be able to ping the containers through the overlay network.

This looks like simple high-level checklist. I’ll now describe the actual steps needed to get this working, leaving the details of my failuers to the last section of this post.

Overlay networks for the impatient

A word of caution

THIS SETUP IS COMPLETELY INSECURE. Consul interfaces are exposed on the public interface of the host with no authentication. Consul communication is not encrypted. The traffic over the overlay network itself is not encrypted nor protected in any way. Using this set-up in any real-life environment is strongly discouraged.

Hardware set-up

I used three hardware machines, all running Debian Linux “Jessie”; all machines got a dynamic IPv4 address from DHCP

  • murray: 192.168.100.10, running consul in server mode;
  • tyrrell: 192.168.100.13, docker host on a 4.9 Linux kernel installed from backports;
  • brabham: 192.168.100.15, docker host, same configuration as tyrrell.

Running the key-value store

create a directory structure, e.g.:

mkdir consul
cd consul
mkdir bin config data

Download consul and unpack it in the bin subdirectory.

Finally, download the start-consul-master.sh script from my github account into the consul directory you created, make it executable and run it:

chmod a+x start-consul-master.sh
./start-consul-master.sh

If everything goes well, you’ll now have a consul server running on your machine. If not, you may have something to tweak in the script. In that case, please refer to the comments in the script itself.

On murray, the script started consul successfully. These are the messages I got on screen:

bronto@murray:~/Downloads/consul$ ./start-consul-master.sh 
Default route on interface wlan0
The IPv4 address on wlan0 is 192.168.100.10
The data directory for consul is /home/bronto/Downloads/consul/data
The config directory for consul is /home/bronto/Downloads/consul/config
Web UI available at http://192.168.100.10:8500/ui
DNS server available at 192.168.100.10 port 8600 (TCP/UDP)
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v0.8.0'
           Node ID: 'eeaf4121-262a-4799-a175-d7037d20695f'
         Node name: 'consul-master-murray'
        Datacenter: 'dc1'
            Server: true (bootstrap: true)
       Client Addr: 192.168.100.10 (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 192.168.100.10 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: 

==> Log data will now stream in as it occurs:

    2017/04/11 14:53:56 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:192.168.100.10:8300 Address:192.168.100.10:8300}]
    2017/04/11 14:53:56 [INFO] raft: Node at 192.168.100.10:8300 [Follower] entering Follower state (Leader: "")
    2017/04/11 14:53:56 [INFO] serf: EventMemberJoin: consul-master-murray 192.168.100.10
    2017/04/11 14:53:56 [WARN] serf: Failed to re-join any previously known node
    2017/04/11 14:53:56 [INFO] consul: Adding LAN server consul-master-murray (Addr: tcp/192.168.100.10:8300) (DC: dc1)
    2017/04/11 14:53:56 [INFO] serf: EventMemberJoin: consul-master-murray.dc1 192.168.100.10
    2017/04/11 14:53:56 [WARN] serf: Failed to re-join any previously known node
    2017/04/11 14:53:56 [INFO] consul: Handled member-join event for server "consul-master-murray.dc1" in area "wan"
    2017/04/11 14:54:03 [ERR] agent: failed to sync remote state: No cluster leader
    2017/04/11 14:54:06 [WARN] raft: Heartbeat timeout from "" reached, starting election
    2017/04/11 14:54:06 [INFO] raft: Node at 192.168.100.10:8300 [Candidate] entering Candidate state in term 3
    2017/04/11 14:54:06 [INFO] raft: Election won. Tally: 1
    2017/04/11 14:54:06 [INFO] raft: Node at 192.168.100.10:8300 [Leader] entering Leader state
    2017/04/11 14:54:06 [INFO] consul: cluster leadership acquired
    2017/04/11 14:54:06 [INFO] consul: New leader elected: consul-master-murray
    2017/04/11 14:54:07 [INFO] agent: Synced node info

Note that the consul web UI is available (on murray, it was on http://192.168.100.10:8500/ui as you can read from the output of the script). I suggest that you take a look at it while you progress in this procedure; in particular, look at how the objects in the KEY/VALUE section change when the docker hosts join the cluster and when the overlay network is created.

Using the key/value store in docker engine

Since my set-up was not permanent (I just wanted to test out the overlay networks but I don’t mean to have a node always on and exposing the key/value store), I did only a temporary change in Docker Engine’s configuration by stopping the system service and then running the docker daemon by hand. That’s not something you would do in production. If you are interested in making the change permanent, please look at the wonderful article in the “Technical scratchpad” blog.

On both docker hosts I ran:

systemctl stop docker.service

Then on brabham I ran:

/usr/bin/dockerd -H unix:///var/run/docker.sock --cluster-store consul://192.168.100.10:8500 --cluster-advertise=192.168.100.15:0

while on tyrrell I ran:

/usr/bin/dockerd -H unix:///var/run/docker.sock --cluster-store consul://192.168.100.10:8500 --cluster-advertise=192.168.100.13:0

If everything goes well, messages on screen will confirm that you have successully joined consul. E.g.: this was on brabham:

INFO[0001] 2017/04/10 19:55:44 [INFO] serf: EventMemberJoin: brabham 192.168.100.15

This one was also on brabham, letting us know that tyrrell had also joined the family:

INFO[0058] 2017/04/10 19:56:41 [INFO] serf: EventMemberJoin: tyrrell 192.168.100.13

The parameter --cluster-advertise is quite important. If you don’t include it, the creation of the overlay network at the following step will succeed, but containers in different hosts will fail to communicate through the overlay network, which will make the network itself pretty pointless.

Creating the overlay network

On one of the docker hosts, create an overlay network. In my case, I ran the following command on tyrrell to create the network 10.17.0.0/16 to span both hosts, and name it bronto-overlay:

$ docker network create -d overlay --subnet=10.17.0.0/16 bronto-overlay
c4bcd49e4c0268b9fb22c7f68619562c6a030947c8fe5ef2add5abc9446e9415

The output, a long network ID, looks promising. Let’s verify that the network is actually created:

$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
3277557c8f42        bridge              bridge              local
80dfa291f573        bronto-bridged      bridge              local
c4bcd49e4c02        bronto-overlay      overlay             global
dee1b78004ef        host                host                local
f1fb0c7be326        none                null                local

Promising indeed: there is a bronto-overlay network and the scope is global. If the command really succeeded, we should find it on brabham, too:

$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
90221b0a57e1        bridge              bridge              local
c4bcd49e4c02        bronto-overlay      overlay             global
3565b28a327b        docker_gwbridge     bridge              local
f4691f3066cb        host                host                local
86bdd8ab9b90        none                null                local

There it is. Good.

Test that everything works

With the network set up, I should be able to create one container on each host, bound to the overlay network, and they should be able to ping each other. Not only that: they should be able to resolve each other’s names through Docker’s DNS service and despite being on different machines. Let’s see.

On each machine I ran:

docker run --rm -it --network bronto-overlay --name in-$HOSTNAME debian /bin/bash

That will create a container named in-tyrrell on tyrrell and a container named in-brabham on brabham. Their main network interface will appear to be bound to the overlay network. E.g. on in-tyrrell:

root@d86df4662d7e:/# ip addr show dev eth0
68: eth0@if69: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:11:00:03 brd ff:ff:ff:ff:ff:ff
    inet 10.17.0.3/16 scope global eth0
       valid_lft forever preferred_lft forever

We now try to ping the other container:

root@d86df4662d7e:/# ping in-brabham
PING in-brabham (10.17.0.2): 56 data bytes
64 bytes from 10.17.0.2: icmp_seq=0 ttl=64 time=0.857 ms
64 bytes from 10.17.0.2: icmp_seq=1 ttl=64 time=0.618 ms
^C--- in-brabham ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.618/0.738/0.857/0.120 ms
root@d86df4662d7e:/#

Success!

How things really went

Things weren’t so straightforward in reality.

I will use a consul container. Or not.

When I was designing my experiment I thought I would run consul through a consul container in one of the two hosts. I thus spent some time playing with the progrium/consul container, as indicated by Docker’s documentation, until I found out that it wasn’t being updated since two years… You can find the outcast of my experiment with progrium/consul on github.

I thus switched to Docker’s official consul image. It was similar enough to progrium/consul to seem an easy port, and different enough that it made me hit the wall a few times before I got things right with that image, too. With the help of a couple of scripts I could now easily start a server container and a client container. You can find the outcast of my experiments with docker’s consul image on github, too.

Then, I understood that the whole thing was pointless. You don’t need to have a consul client running on the docker host, because the docker engine itself will be the client. And you can’t have the consul server running as a container on one of the docker hosts. In fact, the docker engine itself needs to register at start-up with the consul server, but the consul server won’t be running because docker hasn’t started… ouch!

That wasn’t a complete waste of time anyway. In fact, with the experience acquired in making consul containers work and the scripts that I had already written, it was an easy task to configure a consul server on a separate laptop (murray). The script that I used to run the consul server is also available on github.

The documentation gap

When I started playing with docker, Docker Swarm didn’t exist yet, or was still in a very primitive stage. The only way to make containers on different hosts communicate were overlay networks and there was no automated procedure to create them, therefore Docker’s documentation explained in some details how they should be set up. With the release of Swarm, overlay networks are created automatically by Swarm itself and not much about the “traditional” set up of the overlay networks was left in the documentation. Walking through the “time machine” in the docs web page proved to be problematic, but luckily enough I found this article on the Technical Scratchpad blog that helped me to connect the dots.

Don’t believe in magic

When I initially started the docker engine and bound it to the consul server I left out the --cluster-advertise option. All in all, docker must be smart enough to guess the right interface for that, right?

No, wrong.

When I did leave the option out, everything seemed to work. In particular, when I created the overlay network on tyrrell I could also see it on brabham. And when I created a container on each host, docker network inspect bronto-overlay showed that both containers were registered on the network. Except that they couldn’t communicate. The problem seems to be known, and things started to work only when I added the --cluster-advertise option with a proper value for each host.


Tagged: consul, Debian, docker, linux, networking

by bronto at April 11, 2017 02:42 PM

Sarah Allen

the making of a fireplace

I built a fake fireplace in celebration of our team shipping Cloud Functions for Firebase. The codename was “hearth” — the product was almost ready for public beta by the time I joined the team, yet I know it was an epic journey to create a long-awaited feature and I felt the milestone deserved dramatic punctuation.

Red brick surrounds a monitor with video of log fire, wooden top has rows of whiskey bottles and a card

If you would like to build your very own fake fireplace, here’s how…

Initial Research

I reached out to our helpful facilities staff to make sure that I understood the constraints of the space. (I didn’t want to buy or create something that would be immediately taken down!) James Miller from facilities investigated some options:

“Since a hearth is technically the flat stone floor or the part outside of the fireplace, putting up just a fireplace and mantel leaves out the most important part, but including it presents a tripping hazard.” He suggested a quick and easy solution would be to temporarily change the display play a 8hr yule log video or on extra computer monitor with some fake foam bricks. He noted that some prop fireplaces which look pretty good in the photos, but are just cardboard. There are more expensive electronic fireplaces, but we can’t have anything that includes heating features similar to a space heater, and he reported “most that I’ve seen online are also missing the most important part… the actual hearth.”

Initially I felt that a literal interpretation of the “hearth” code name was not a critical element. We explored buying a real fake fireplace, but the ones we found were very small and we felt they were very expensive relative to the potential impact. However, his thoughts on the meaning of “hearth” later led to some important design elements. I love being involved in a collaborative creative process!

James suggested that something in a stage prop style would have dramatic effect, perhaps using styrofoam so that it would be light and easy to move, if we needed to shift it in the future around team moves. In brainstorming location, he helped me remember to pick out a spot with an electrical outlet.

Measure Twice, Cut Once

an aisle with people at desks on either side, at the end is a small bookshelf with bottles of alcohol and above that a large monitor on the wallI surreptitiously took a photo of the area, and arrived early one morning with a tape measure to determine its maximum dimension.I then measured the computer monitor I planned to use to determine the minimum internal dimension.

There was actually a third, very important measurement: the internal size of my car, which later became a critical optimization. I designed the fireplace in three pieces, so it could fit into the car, which also made it much lighter and easier to transport. Styrofoam is rather fragile, and the smaller pieces fit easily through doors.

Tools & Supplies

Tools: tape measure, hammer, screw driver, propane torch, safety googles, mask, electric drill, electric sander, garden shears, mat knife, level, steel square

Supplies: small nails, wood, styrofoam panels, gorilla glue, sandpaper, acrylic paint (red, yellow, white, black), gloss latex paint (off-white), sharpie, cardboard

I had never worked with styrofoam before and read that it could be carved with a sharp knife, acetone or a hot knife. If it were a small project, a regular soldering iron would likely have been fine, but I wanted something that would be fast and really liked that the propane torch I found had two different tips as well as an open flame option. The small hand-held propane filled torch is also easier to work with than my electric soldering iron which requires an extension cord.

Construction Details

I tested carving and painting the styrofoam very early, including figuring out how long the paint took to dry. I found that painting the white cracks before painting the red bricks made it come out a bit better. It was also good to test carving to create an effect that looks like mortar.

See key steps in photos

  1. Wooden frame: I uses 1″ thick boards for the frame. Discount Builders in San Francisco has a friendly staff and cut long 12′ or 8′ boards into the lengths I needed. The hearth is a separate component, which is like a hollow box with one side missing, where the missing side is actually attached to the larger frame, so the top can rest on it for stability and is not actually physically attached. I used a level and steel square for an even frame and wood screws.
  2. Top: I used a single 1″ thick board and drilled small holes for pegs to attach the top to the frame in a way that could be easily removed, yet fit snugly. I used garden shears to cut a long wooden dowel into 1″ lengths to create the pegs. I used an electric sander for rounded edges on three sides, and finished with fine sandpaper. I then coated with a few coats of gloss paint (which would have been easier if I had remembered primer!).
  3. Styrofoam: for the large panels I found that using small nails to hold the styrofoam in place made it easier to keep everything steady while gluing it all together. I attached all of the styrofoam panels before carving bricks with the torch. Large cuts were most effective with a mat knife.
  4. Carving the bricks: I found a few inspirational Fireplace images via Google Image Search and drew bricks with a Sharpie, then traced the lines with the torch using the soldering gun tip. Then I went back and made the brick edges a bit irregular and created a few cracks in random bricks. I also used an open flame in areas to create more texture.
  5. Painting the bricks: I mixed acrylic paint with a little dish soap and water, which makes it easier to coat the styrofoam. I noticed that most fireplaces have irregular brick colors, so I varied the colors a bit with yellow and brown highlights, as well as black “sooty” areas.
  6. The inside is painted black with a piece of cardboard as backing.

Additional Resources

I studied sculpture in college, so I had practical experience with wood construction, and learned a bit more from the amazing people who publish about their experience on the web. Here are some links to resources that I found particularly helpful.

Tutorials

Videos

by sarah at April 11, 2017 03:59 AM

April 10, 2017

Everything Sysadmin

Review: Tivo BOLT+

The newest TiVo model is called the BOLT. I've been using TiVo since the first generation, back when it used a dial-up modem to download the tv guide listings and software updates. My how far we've come!

If you have a TiVo already, the BOLT user interface looks and acts the same but everything is faster and better. There is a new feature that automatically skips commercials (if the TV show permits it), an a feature that plays shows at 30% faster speed, with pitch-correction. Everything is faster. This unit has more RAM and a faster CPU than any previous TiVo model, which really shows in the UI. Everything is snappier and that makes it more usable. Most importantly apps like Netflix, Hulu, HBO Go, and Amazon Prime start up instantly instead of making you wait a frustratingly long amount of time. On the old model I'd think twice before starting the Netflix app because it took a full minute to start. If I accidentally hit a key and exited the app, I'd often give up and return to watching recorded TV rather than start the Netflix app again. It just wasn't worth it. The new model eliminates that kind of problem.

That said, the real star of the show is the setup process. Every home electronics manufacturer should be jealous. Since I bought my TiVo online from TiVo.com, they've connected its serial number to my account and therefore have been able to do most of the setup ahead of time. I simply plugged it in, moved my CableCard from my old device, and followed a few basic instructions. I was shocked at how fast the process was. I was shocked at how streamlined it was. This level of perfection must have touched on not just TiVo's engineering team, but everything from logistics, to finance, to packaging, to documentation. I kept saying, "That was too fast! I must have forgotten something!" But no, it really was set up and working. I was sitting back streaming an episode of Rick and Morty.

Even the documentation is excellent. Having such a refined setup process reduces the amount of documentation need, of course. Therefore the docs focus on what's new and basic tips. For example, it points out that even if you haven't received your CableCard yet (that comes from your cable provider and they often drag their feet sending you one), it tells you to do the setup anyway as you'll be able to watch unencrypte channels and stream. Good point. I would have assumed I should leave the TiVo in its box until the CableCard arrived... an unnecessary delay! This tells me that their customer support group and documentation group actually talk with each other.

The Netflix app is much better than on my old TiVo Series3 and TiVo Premiere models. The old models have a Netflix app that uses the buttons on the remote in ways that I can only describe as "creative". Had the product designer never actually used a TiVo before? On the BOLT the Netflix app uses the buttons for the same functions as when watching TV. A lot of the functionality that hasn't changed just plain works better because the CPU is faster. For example on the older slow hardware it is typical to press a button, the TiVo is slow to respond, you press the button again because you think maybe the TiVo didn't receive the first keypress, then both keypresses execute and you are fucked. I hate to dis the engineering team at Netflix, but the app just fails (on older TiVos) in ways that smell like the developers weren't not given access to actual hardware and designed it without realizing how slow the CPU would be. The TiVo BOLT's faster CPU seems to have caught up with their slow software. This is really the first TiVo model where using Netflix meets my high bar to be considered "usable". Netflix on my Mac still gives slightly finer control (you can skip to a specific place by clicking the timeline), but I'll be watching a lot more Netflix on my TiVo now.

The HBO app is better than on the older TiVos but every time I use HBO Go I feel like I'm using a product that is just struggling to run, leaving the engineers no time to make it run well. That said, the faster CPU makes the HBO app less annoying to use. Good job, Hollywood!

The only feature that the new TiVo is missing is the ability to download streams and watch them later. Right now it seems like any streaming is buffered for only a few seconds. If your ISP is having a bad day, you might spend a lot of time waiting for it to buffer. For programming I know I'm going to watch, I wish the streaming services would just let TiVo download the show to my harddrive. I'm sure their lawyers have their knickers in a twist about such features (IP lawyers think that 1's and 0's stored on a hard drive are totally different than ones stored in a buffer. When will they learn?). That said, such a feature would probably make it easier on the Netflix CDN considering they could trickle-feed such videos to me during quieter network times. But I digress..

Lastly... you might be wondering why I'm writing about TiVo on a blog for system administrators. Well, I believe that using a DVR is an important part of time management. Using a DVR puts you in control of your TV-watching time. Otherwise, the TV network controls you. This is why I dedicate an entire page to DVR tips in Time Management for System Administrators.

If you have a TiVo and are considering getting the BOLT or BOLT+, I think it is completely worth it. If you don't have a DVR, I don't think you can go wrong with a TiVo.

Five stars. Would buy again!

by Tom Limoncelli at April 10, 2017 03:00 PM

Raymii.org

Run software on the tty1 console instead of getty login on Ubuntu 14.04 and 16.04

Recently I wanted to change the default login prompt on the tty1 console on an OpenStack instance to automatically run htop. Instead of logging in via the console, I wanted it to start up htop right away and nothing else. Ubuntu 14.04 uses init and Ubuntu 16.04 uses systemd. Both ways are shown in this tutorial.

April 10, 2017 12:00 AM

April 09, 2017

Raymii.org

Check HTTP status code for a page on all DNS records

This is a small snippet using curl to check the status code of a given URL on all DNS records for a given domain. This site has a few A records in round robin mode, and sometimes the automatic deployment fails. Using this query I can check which server is the culprit and fix it manually.

April 09, 2017 12:00 AM

April 08, 2017

Evaggelos Balaskas

regular expressions in find

After upgrading one of my linux boxes from CentOS 6.8 to 6.9, I wanted to find out the files that I had to review. From experience I already knew what file names I should check: .rpmsave & .rpmnew

The command I usually type is: find

# find /etc/|egrep ".*rpm(save|new)$"

/etc/rc.d/init.d/postgrey.rpmsave
/etc/php.ini.rpmnew
/etc/sudoers.rpmnew
/etc/postfix/postgrey_whitelist_clients.local.rpmsave
/etc/sysctl.conf.rpmnew

a more nice way is to tell find to search for files with type: file to exclude any binary searches:

# find /etc/ -type f |egrep ".*rpm(save|new)$"

/etc/rc.d/init.d/postgrey.rpmsave
/etc/php.ini.rpmnew
/etc/sudoers.rpmnew
/etc/postfix/postgrey_whitelist_clients.local.rpmsave
/etc/sysctl.conf.rpmnew

but find is a very powerful command, and reading through the manual page:

-regex pattern

File name matches regular expression pattern. This is a match on the whole path, not a
search. For example, to match a file named ‘./fubar3’, you can use the regular expression
‘.bar.’ or ‘.b.3’, but not ‘f.r3’. The regular expressions understood by find are by
default Emacs Regular Expressions, but this can be changed with the -regextype option.

ok, we are getting somewhere. I can use -regex with an emacs regular expression pattern to search.

# find /etc/ -type f -regex ".*rpm(save|new)$"

Nothing in output !!! aka this is a “WAT ?????” moment.

wat.jpg

Perhaps I am not typing an emacs regex.
Let’s try to use an alternative:

# find /etc/ -type f -regextype -name "*rpmsave$"

valid types are findutils-default',awk’, egrep',ed’, emacs',gnu-awk’, grep',posix-awk’, posix-basic',posix-egrep’, posix-extended',posix-minimal-basic’, `sed’.

With this typo, I can find out what the alternatives

ok, let’s try egrep or anything else:


# find /etc/ -type f -regex ".*rpm(save|new)$" -regextype sed

# find /etc/ -type f -regex ".*rpm(save|new)$" -regextype posix-egrep
# find /etc/ -type f -name ".*rpm(save|new)$" -regextype posix-egrep

# find /etc/ -type f -name ".*rpm(save|new)$" -regextype egrep
# find /etc/ -type f -name ".*rpm(save|new)$" -regextype sed
# find /etc/ -type f -name ".*rpmsave$" -regextype sed
# find /etc/ -type f -name ".*rpmsave$" -regextype posix-egrep
# find /etc/ -type f -name ".*rpmsave$" -regextype egrep

# find /etc/ -type f -regex ".*rpm(save)$" -regextype egrep
# find /etc/ -type f -regex ".*rpm(save|new)$" -regextype egrep

Nothing !!!

Am I typing this correctly ?

# find /etc/ -type f | egrep ".*rpm(save|new)$"

/etc/rc.d/init.d/postgrey.rpmsave
/etc/php.ini.rpmnew
/etc/sudoers.rpmnew
/etc/postfix/postgrey_whitelist_clients.local.rpmsave
/etc/sysctl.conf.rpmnew

then, what the h3ll?

Let’s read the manual page, once more:

The -daystart, -follow and -regextype options are different in this respect, and have an effect only on tests which appear later in the command line. Therefore, for clarity, it is best to place them at the beginning of the expression

Exhhmmmmm

I need to put -regextype before the regex.


# find /etc/ -type f -regextype egrep -regex ".*rpm(save|new)$"

/etc/rc.d/init.d/postgrey.rpmsave
/etc/php.ini.rpmnew
/etc/sudoers.rpmnew
/etc/postfix/postgrey_whitelist_clients.local.rpmsave
/etc/sysctl.conf.rpmnew

Yeah !

rtfm.jpg

April 08, 2017 05:08 PM

April 07, 2017

Everything Sysadmin

Hollywood doesn't understand software

Hollywood doesn't understand software. Not, at least, as well as high-tech companies do. This is very frustrating. Bad software keeps wrecking my entertainment experience.

I'm currently writing an article and I need to come up with a term that means software that was written by old-school (historically non-technology) companies just so they can say "Look! we made an app! Will you shut up, now?!" as opposed to software that has great fit and finish, gets updated regularly, and stays current.

My favorite example of this is the CBS streaming software. It seems like it was written just to shut up people that have been asking to stream NCIS, not because CBS actually wants to be in the streaming business.

The HBO streaming software is frustratingly "almost good".

The Weight Watchers app is also in this category. I don't think Oprah approves of this app. Or, if she does, she hasn't seen the competition's applications. This is a "shut up and use it" app, rather than something they're betting the company on. I'm a WW success story but only because I learned how to work around the app, not with it.

Most enterprise software seems to be in this category. "Oh shit, it actually works? Better ship it!" seems to be the rule for most enterprise software. There's no budget for fit and finish for internally-developed apps. There are exceptions to this, of course, but not that many.

Software is eating the world, yo! Develop in-house software competency, hire executives and managers that understand SDLC and operational principles (i.e. DevOps). You can't take a pass on this and hope it is going away. Computers are not a fad. The internet isn't going away.

P.S. No offense to my friends at CBS, WW, HBO, enterprises, and Hollywood. It isn't you. It is your management.

by Tom Limoncelli at April 07, 2017 06:59 PM

April 01, 2017

syslog.me

Improving your services, the DevOps way

devops-italiaOn March 10th I was in Bologna for Incontro DevOps Italia 2017, the Italian DevOps meeting organized by the great people at BioDec. The three tracks featured several talks in both Italian and English, and first-class international speakers. And, being a conference in Bologna, it also featured first-class local food that no other conference around the world will ever be able to match.

In my presentation, “Improving your services, the DevOps way”, I illustrated the methodology we used in Opera to improve our mail service with a rolling approach, where the new infrastructure was built alongside the existing one through collaboration and a mix of agile and DevOps techniques, and progressively replaced it. The slides are available on SpeakerDeck.

I understand from the organizers that the recordings will be available some time in April. However, they will be of little use for non-Italian speakers, since at the very last minute I decided to hold the speech in Italian — a choice that I deeply regretted. I have a plan to do the presentation again for the DevOps Norway meetup group, though I haven’t set a date yet. When I do, I’ll see if I can at least provide a Periscope stream.


Tagged: conference, DevOps, Opera, Sysadmin

by bronto at April 01, 2017 03:07 PM

Feeding the Cloud

Manually expanding a RAID1 array on Ubuntu

Here are the notes I took while manually expanding an non-LVM encrypted RAID1 array on an Ubuntu machine.

My original setup consisted of a 1 TB drive along with a 2 TB drive, which meant that the RAID1 array was 1 TB in size and the second drive had 1 TB of unused capacity. This is how I replaced the old 1 TB drive with a new 3 TB drive and expanded the RAID1 array to 2 TB (leaving 1 TB unused on the new 3 TB drive).

Partition the new drive

In order to partition the new 3 TB drive, I started by creating a temporary partition on the old 2 TB drive (/dev/sdc) to use up all of the capacity on that drive:

$ parted /dev/sdc
unit s
print
mkpart
print

Then I initialized the partition table and creating the EFI partition partition on the new drive (/dev/sdd):

$ parted /dev/sdd
unit s
mktable gpt
mkpart

Since I want to have the RAID1 array be as large as the smaller of the two drives, I made sure that the second partition (/home) on the new 3 TB drive had:

  • the same start position as the second partition on the old drive
  • the end position of the third partition (the temporary one I just created) on the old drive

I created the partition and flagged it as a RAID one:

mkpart
toggle 2 raid

and then deleted the temporary partition on the old 2 TB drive:

$ parted /dev/sdc
print
rm 3
print

Create a temporary RAID1 array on the new drive

With the new drive properly partitioned, I created a new RAID array for it:

mdadm /dev/md10 --create --level=1 --raid-devices=2 /dev/sdd1 missing

and added it to /etc/mdadm/mdadm.conf:

mdadm --detail --scan >> /etc/mdadm/mdadm.conf

which required manual editing of that file to remove duplicate entries.

Create the encrypted partition

With the new RAID device in place, I created the encrypted LUKS partition:

cryptsetup -h sha256 -c aes-xts-plain64 -s 512 luksFormat /dev/md10
cryptsetup luksOpen /dev/md10 chome2

I took the UUID for the temporary RAID partition:

blkid /dev/md10

and put it in /etc/crypttab as chome2.

Then, I formatted the new LUKS partition and mounted it:

mkfs.ext4 -m 0 /dev/mapper/chome2
mkdir /home2
mount /dev/mapper/chome2 /home2

Copy the data from the old drive

With the home paritions of both drives mounted, I copied the files over to the new drive:

eatmydata nice ionice -c3 rsync -axHAX --progress /home/* /home2/

making use of wrappers that preserve system reponsiveness during I/O-intensive operations.

Switch over to the new drive

After the copy, I switched over to the new drive in a step-by-step way:

  1. Changed the UUID of chome in /etc/crypttab.
  2. Changed the UUID and name of /dev/md1 in /etc/mdadm/mdadm.conf.
  3. Rebooted with both drives.
  4. Checked that the new drive was the one used in the encrypted /home mount using: df -h.

Add the old drive to the new RAID array

With all of this working, it was time to clear the mdadm superblock from the old drive:

mdadm --zero-superblock /dev/sdc1

and then change the second partition of the old drive to make it the same size as the one on the new drive:

$ parted /dev/sdc
rm 2
mkpart
toggle 2 raid
print

before adding it to the new array:

mdadm /dev/md1 -a /dev/sdc1

Rename the new array

To change the name of the new RAID array back to what it was on the old drive, I first had to stop both the old and the new RAID arrays:

umount /home
cryptsetup luksClose chome
mdadm --stop /dev/md10
mdadm --stop /dev/md1

before running this command:

mdadm --assemble /dev/md1 --name=mymachinename:1 --update=name /dev/sdd2

and updating the name in /etc/mdadm/mdadm.conf.

The last step was to regenerate the initramfs:

update-initramfs -u

before rebooting into something that looks exactly like the original RAID1 array but with twice the size.

April 01, 2017 06:00 AM

The Lone Sysadmin

Install the vCenter Server Appliance (VCSA) Without Ephemeral Port Groups

Trying to install VMware vCenter in appliance/VCSA form straight to a new ESXi host? Having a problem where it isn’t listing any networks, and it’s telling you that “Non-ephemeral distributed virtual port groups are not supported” in the little informational bubble next to it? Thinking this is Chicken & Egg 101, because you can’t have an […]

The post Install the vCenter Server Appliance (VCSA) Without Ephemeral Port Groups appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at April 01, 2017 04:07 AM

March 28, 2017

Sean's IT Blog

Configuring Duo Security MFA for Horizon Unified Access Gateway

Note: After publishing, I decided to rework this blog post a bit to separate the AD-integrated Duo configuration from the Duo-only configuration.  This should make the post easier to follow.

In my last post, I went through the steps for deploying a Horizon Access Point/Unified Access Gateway using the PowerShell deployment script.  That post walked through the basic deployment steps.

The Unified Access Gateway supports multiple options for two-factor authentication, and many real-world deployments will use some form of two-factor when granting users access to their desktops and applications remotely.  The Unified Access Gateway supports the following two-factor authentication technologies:

  • RSA Secur-ID
  • RADIUS
  • Certificate Based/Smart-Cards

Because I’m doing this in a lab environment, I decided to use a RADIUS-based technology for this post.  I’ve been using Duo Security for a while because they support RADIUS, have a mobile app, and have a free tier.  Duo also supports VMware Horizon, although they do not currently have any documentation on integrating with the Access Point/Unified Access Gateway.

Duo Security for Multi-factor Authentication

Duo Security is a cloud-based MFA provider.  Duo utilizes an on-premises Authentication Proxy to integrate with customer systems.  In addition to providing their own authentication source, they can also integrate into existing Active Directory environments or RADIUS servers.

When using Active Directory as the authentication source, Duo will utilize the same username and password as the user’s AD account.  It will validate these against Active Directory before prompting the user for their second authentication factor.

Understanding Unified Access Gateway Authentication Path

Before I configure the Unified Access Gateway for two-factor authentication with Duo, let’s walk through how the appliance handles authentication for Horizon environments and how it compares to the Security Server.  There are some key differences between how these two technologies work.

When the Security Server was the only option, two-factor authentication was enabled on the Connection Servers.  When users signed in remotely, the security server would proxy all authentication traffic back to the connection server that it was paired with.  The user would first be prompted for their username and their one-time password, and if that validated successfully, they would be prompted to log in with their Active Directory credentials.  The reliance on connection servers meant that two sets of connection servers needed to be maintained – one internal facing without multi-factor authentication configured and one external facing with multi-factor configured.

Note: When using Duo in this setup, I can configure Duo to use the same initial credentials as the user’s AD account and then present the user with options for how they want to validate their identity – a phone call, push to a mobile device, or passcode.  I will discuss that configuration below.

The Unified Access Gateway setup is, in some ways, similar to the old Security Server/Connection Server setup.  If 2FA is used, the user is prompted for the one-time password first, and if that is successful, the user is prompted for their Active Directory credentials.  But there are two key differences here.  First, the Unified Access Gateway is not tied to a specific Connection Server, and it can be pointed at a load-balanced pool of connection servers.  The other difference is 2FA is validated in the DMZ by the appliance, so 2FA does not need to be configured on the Connection Servers.

Horizon has a couple of multi-factor authentication options that we need to know about when doing the initial configuration.  These two settings are:

  • Enforce 2-factor and Windows user name matching (matchWindowsUserName in UAG) – enabled when the MFA and Windows user names match.
  • Use the same username and password for RADIUS and Windows authentication (WindowsSSOEnabled in UAG) – Used when the RADIUS server and Windows have the same password.  When this setting is enabled, the domain sign-on prompt is skipped.

If my two-factor authentication system supports Active Directory authentication, I can use my Windows Username and Password to be authenticated against it and then receive a challenge for a one-time password (or device push).  If this second factor of authentication is successful, I will be automatically signed into Horizon.

Configuring Duo

The Duo environment needs to be configured before Horizon MFA can be set up.  Duo requires an on-premises authentication proxy.  This proxy acts as a RADIUS server, and it can run on Windows or Linux.  The steps for installing the Duo authentication proxy are beyond the scope of this article.

Once the authentication proxy is installed, it needs to be configured.  Duo has a number of options for configuration, and it’s important to review the documentation.  I’ll highlight a few of these options below.

The first thing we need to configure is the Duo client.  The client determines what type of primary authentication is performed.  Duo supports three methods:

  • ad_client: Duo uses Active Directory for primary authentication.  Users sign in with their Active Directory username and password.
  • radius_client: Duo uses the specified RADIUS server, such as Microsoft NPS or Cisco ACS, for primary authentication.
  • duo_only_client: Duo does not perform primary authentication.  The authentication proxy only performs secondary authentication, and primary authentication is handled by the configured system.

I don’t have a RADIUS server configured in my lab, so I did testing with both the ad_client and the duo_only_client. I prefer a single sign-on solution in my lab, so this setup will be configured with the ad_client.

The authentication proxy configuration is contained in the authproxy.cfg file located in C:\Program Files (x86)\Duo Security Authentication Proxy\conf.  Open this file in a text editor and add the following lines:

[ad_client]
host=DC.IP.0.1
host_2=DC.IP.0.2
service_account_username=serviceaccount
service_account_password=serviceaccountpassword
search_dn=DC=domain,DC=com

Duo includes a number of options for configuring an Active Directory client, including encrypted passwords, LDAPS support, and other security features.  The configuration above is a simple configuration meant for a lab or PoC environment.

Before the RADIUS server settings can be configured, a new application needs to be created in the Duo Administration Console.  The steps for this are:

1. Log into the Duo Admin Console.

2. Click Applications

3. Click Protect an Application

4. Scroll down to VMware View and select “Protect this Application.”

5. Copy the Integration Key, Secret Key, and API Hostname.

6. Change the username normalization option to “Simple.”

7. Click “Save Changes.”

The next step is to configure the authentication proxy as a RADIUS service.  Duo also has a number of options for this.  These determine how Duo interacts with the client service.  These options include:

  • RADIUS_Auto: The user’s authentication factor, and the device that receives the factor, is automatically selected by Duo.
  • RADIUS_Challenge: The user receives a textual challenge after primary authentication is complete.   The user then selects the authentication factor and device that it is received on.
  • RADIUS_Duo_Only: The RADIUS service does not handle primary authentication, and the user’s passcode or factor choice is used as the RADIUS password.

Duo also has multiple types of authentication factors.  These options are:

  • Passcode: A time-based one-time password that is generated by the mobile app.
  • Push: A challenge is pushed to the user’s mobile device with the Duo mobile app installed.  The user approves the access request to continue sign-in.
  • Phone Call: Users can opt to receive a phone call with their one-time passcode.
  • SMS: Users can opt to receive a text message with their one-time passcode.

RADIUS_Challenge will be used for this setup.  In my testing, the combination of an ad_client and RADIUS_Auto meant that my second authentication factor was a push to the mobile app.  In most cases, this was fine, but in situations where my laptop was online but my phone was not (such as on an airplane), meant that I was unable to access the environment.  The other option is to use RADIUS_Duo_Only with the Duo_Only_Client, but I wanted a single sign-on experience.

The RADIUS server configuration for the Horizon Unified Access Gateway is:

[radius_server_challenge]
ikey=[random key generated by the Duo Admin Console]
skey=[random key generated by the Duo Admin Console]
api_host=api-xxxx.duosecurity.com [generated by the Duo Admin Console]
failmode=secure
client=ad_client
radius_ip_1=IP Address or Range
radius_secret_1=RadiusSecret
port=1812
prompt_format=short

The above configuration is set to fail secure.  This means that if the Duo service is not available, users will be unable to log in.  The other option that was selected was a short prompt format.  This displays a simple text message with the options that are available and prompts the user to select one.

Save the authproxy.cfg file and restart the Duo Authentication Proxy service for the new settings to take effect.

The next step is to configure the Unified Access Gateway to use RADIUS.  The following lines need to be added to the [Horizon] section:

authMethods=radius-auth

matchWindowsUserName=true

windowsSSOEnabled=true

A new section needs to be added to handle the RADIUS configuration.  The following lines need to be added to create the RADIUS section and configure the RADIUS client on the Unified Access Gateway.  If more than one RADIUS server exists in the environment, you can add an _2 to the end of hostname and auth_port.

[RADIUSAuth]

hostName=IP.to.RADIUS.Server [IP of the primary RADIUS server]

authType=PAP

authPort=1812

radiusDisplayHint=XXX Token

Save the UAG configuration file and deploy, or redeploy, your Unified Access Gateway.  When the deployment completes, and you log in externally, you should see the following screens:

image

image

Duo-Only Authentication

So what if I don’t want an AD-integrated single sign-on experience?  What if I just want users to enter a code before signing in with their AD credentials.  Duo supports that configuration as well.  There are a couple of differences compared to the configuration for the AD-enabled Duo environment.

The first change is in the Duo authentication proxy config file.  No AD client configuration is required.  Instead of an [ad_client] section, a single line entry of [duo_only_client] is required.  The RADIUS Server configuration can also mostly stay the same.  The only required changes are to change the client from ad_client to duo_only_client and the section name from [radius_server_challenge] to [radius_server_duo_only].

The Access Point configuration will change.  The following configuration items should be used instead:

authMethods=radius-auth && sp-auth

matchWindowsUserName=true

windowsSSOEnabled=false

 


by seanpmassey at March 28, 2017 01:30 PM

LZone - Sysadmin

RabbitMQ Does Not Start: init terminating in do_boot

If you have a RabbitMQ cluster and a crashed node fails to start again with
{"init terminating in do_boot",{undef,[{rabbit_prelaunch,start,[]},{init,start_it,1},{init,start_em,1}]}}
in /var/log/rabbitmq/startup_log and something like
Error description:
   {could_not_start,rabbitmq_management,
       {{shutdown,
            {failed_to_start_child,rabbit_mgmt_sup,
                {'EXIT',
                    {{shutdown,
                         [{{already_started,<9180.461.0>},
                           {child,undefined,rabbit_mgmt_db,
                               {rabbit_mgmt_db,start_link,[]},
                               permanent,4294967295,worker,
                               [rabbit_mgmt_db]}}]},
                     {gen_server2,call,
                         [<0.427.0>,{init,<0.425.0>},infinity]}}}}},
        {rabbit_mgmt_app,start,[normal,[]]}}}

Log files (may contain more information): /var/log/rabbitmq/rabbit@rabbit-01.log /var/log/rabbitmq/rabbit@rabbit-01-sasl.log
in /var/log/rabbitmq/rabbit@rabbit-01.log then you might want to try to drop the node from the cluster by running
rabbitmqctl forget_cluster_node rabbit@rabbit-01
one a working cluster node and rejoin the node by running
rabbitmqctl join_cluster rabbit@rabbit-02
on the disconnected node (given rabbit-02 is a working cluster member).

Note: Doing this might make you lose messages!

March 28, 2017 10:46 AM

March 27, 2017

Sean's IT Blog

Horizon 7.0 Part 13 – Deploying The Horizon Access Point

In the last part of this series, I walked through the different remote access options for a Horizon 7 environment.  In this post, I’ll cover how to install and configure an Access Point appliance for a Horizon environment.

Note: The Access Point appliance has been renamed to the Unified Access Gateway as of Horizon 7.1.  This post began before the product was renamed, and the old naming convention will be used.

Before we go into deploying the appliance, let’s dive into what the appliance does and how it’s built.

As I said in the previous post, the Access Point is a purpose built virtual appliance that is designed to be the remote access component for VMware Horizon, VMware Identity Manager, and Airwatch.  The appliance is hardened for deployment in a DMZ scenario, and it is designed to only pass authorized traffic from authenticated users into a secure network.  In some ways, the Access Point is designed to replace VPNs, but it doesn’t provide full access to an internal network like a VPN would.

When deploying an Access Point, I highly recommend using the PowerShell Deployment Script.  This script was written by Mark Benson, the lead developer of the Access Point.  The script uses an INI configuration file that can be customized for each appliance that is deployed.  I like the PowerShell script over deploying the appliance through vCenter because the appliance is ready to use on first boot, it allows administrators to track all configurations in a source control system such as Github or Bitbucket Server, and this provides both documentation for the configuration and change tracking.  It also makes it easy to redeploy or upgrade the access point because I rerun the script with my config file and the new OVA file.

The PowerShell script requires the OVF Tool to be installed on the server or desktop where the PowerShell script will be executed.  The latest version of the OVF Tool can be downloaded from the My VMware site.  PowerCLI is not required when deploying the Access Point as OVF Tool will be deploying the Access Point and injecting the configuration.

The steps for deploying the Access Point are:

1. Download the PowerShell deployment script for the version of the Access Point you will be deploying.  You can download the script from here.

2.  Right click on the downloaded zip file and select Properties.

3. Click Unblock.  This step is required because the file was downloaded from the Internet, and is untrusted by default, and this can prevent the script from executing.

4. Extract the contents of the downloaded ZIP file to a folder on the system where the deployment script will be run.  The ZIP file contains the apdeploy.ps1 script file and five INI template files.  As of January 2017, four of the template files are example configurations for Horizon, and one is a sample configuration for vIDM. 

When deploying the access points for Horizon, I recommend starting with the AP2-Advanced.ini template.  This template provides the most options for configuring Horizon remote access and networking.  Once you have the AP deployed successfully, I recommend copying the relevant portions of the SecurID or RADIUS auth templates into your working AP template.  This allows you to test remote access and your DMZ networking and routing before adding in MFA.

5. Before we start filling out the template for our first access point, there are some things we’ll need to do to ensure a successful deployment. These steps are:

A. Ensure that the OVF Tool is installed on your deployment machine.

B. Locate the Access Point’s OVA file and record the full file path.  The OVA file can be placed on a network share.

C. We will need a copy of the certificate, including any intermediate and root CA certificates, and the private key in PEM format.  The certificate files should be concatenated so that the certificate and any CA certificates in the chain are in one file, and the private key should not have a password on it.  Place these files into a folder on the local or network folder and record the full path.

D. We need to create the path to the vSphere resources that OVF Tool will use when deploying the appliance.  This path looks like: vi://user@PASSWORD:vcenter.fqdn.orIP/DataCenter Name/host/Host or Cluster Name/

Do not replace the uppercase PASSWORD with any value.  This is an OVF Tool variable that prompts the user for a password before deploying the appliance.  OVF Tool is case sensitive, so make sure that the datacenter name and host or cluster names are entered as they are displayed in vCenter.

E. Generate the passwords that  you will use for the appliance Root and Admin passwords.

F. Get the SSL Thumbprint for the certificate on your Connection Server or load balancer that is in front of the connection servers.

6. Fill out the template file.  The file has comments for documentation, so it should be pretty easy to fill out.  There are a couple of things that I’ve noticed when deploying the access point using this method.  You need to have a valid port group for all three networks, even if you are only using the OneNic deployment option. 

7. Save your INI file as <APName>.ini in the same directory as the deployment scripts.

8. Open PowerShell and change to the directory where the deployment scripts are stored.

9. Run the deployment script.  The syntax is .\APDeploy.ps1 –inifile <apname>.ini

10. Enter the appliance root password twice.

11.  Enter the admin password twice.  This password is optional, however, if one is not configured, the REST API and Admin interface will not be available.

12.  If RADIUS is configured in the INI file, you will be prompted for the RADIUS shared secret.

13. After the script opens the OVA and validates the manifest, it will prompt you for the password for accessing vCenter.  Enter it here.

14. If an access point with the same name is already deployed, it will be powered off and deleted.

15. The appliance OVA will be deployed.  When the deployment is complete, the appliance will be powered on and get an IP address from DHCP.

16. The appliance configuration defined in the INI file will be injected into the appliance.  It may take a few minutes for configuration to be completed.

image 

Testing the Access Point

Once the appliance has finished it’s deployment and self-configuration, it needs to be tested to ensure that it is operating properly. The best way that I’ve found for doing this is to use a mobile device, such as a smartphone or cellular-enabled tablet, to access the environment using the Horizon mobile app.  If everything is working properly, you should be prompted to sign in, and desktop pool connections should be successful.

If you are not able to sign in, or you can sign in but not connect to a desktop pool, the first thing to check is your firewall rules.  Validate that TCP and UDP ports 443 and 4172 are open between the Internet and your Access Point.  You may also want to check your network routes in your configuration file.  If your desktops live in a different subnet than your access points and/or your connection servers, you may need to statically define your routes.  An example of a route configuration may look like the following:

routes1 = 192.168.2.0/24 192.168.1.1,192.168.3.0/24 192.168.1.1

If you need to make a routing change, the best way to handle it is to update the ini file and then redeploy the appliance.


by seanpmassey at March 27, 2017 01:30 PM

HolisticInfoSec.org

Toolsmith #124: Dripcap - Caffeinated Packet Analyzer

Dripcap is a modern, graphical packet analyzer based on Electron.
Electron, you say? "Electron is a framework for creating native applications with web technologies like JavaScript, HTML, and CSS. It takes care of the hard parts so you can focus on the core of your application."
We should all be deeply familiar with the venerable Wireshark, as it has long been the forerunner for packet analysts seeking a graphical interface to their PCAPs. Occasionally though, it's interesting to explore alternatives. I've long loved NetworkMiner, and the likes of Microsoft Message Analyzer and Xplico each have unique benefits.
For basic users comfortabel with Wireshark, you'll likely find Dripcap somewhat rudimentary at this stage, but it does give you opportunities to explore packet captures at fundamental levels and learn without some of the feature crutches more robust tools offer.
However, for JavaScript developers,  Dripcap opens up a whole other world of possibilities. Give the Create NTP dissector package tutorial a read, you can create, then publish and load dissector (and others) packages of your choosing.

Installation
I built Dripcap from source on Windows as follows, using Chocolatey.
From a administrator PowerShell prompt (ensure Get-ExecutionPolicy is not Restricted), execute the following (restart your admin PS prompt after #2):
  1. iwr https://chocolatey.org/install.ps1 -UseBasicParsing | iex
  2. choco install git make jq nodejs
  3. git clone https://github.com/dripcap/dripcap.git
  4. cd dripcap
  5. npm install -g gulp node-gyp babel-cli
  6. npm install
  7. gulp
Step 1 installs Chocolatey, step 2 uses Chocolatey to install tools, step 3 clones Dripcap, steps 5 & 6 install packages, and step 7 builds it all.    
Execute dripcap, and you should be up and running.
You can also  use npm, part of Node.js' package ecosystem to install Dripcap CLI with npm install -g dripcap, or just download dripcap-windows-amd64.exe from Dripcap Releases.

Experiment 

I'll walk you through packet carving of sorts with Dripcap. One of Dripcap's strongest features is its filtering capabilities. I used an old PCAP with an Operation Aurora Internet Explorer exploit (CVE-2010-0249) payload for this tool test.
Ctrl+O will Import Pcap File for you.

Click Developer, then Toggle Log Panel for full logging.

Figure 1: Dripcap
You'll note four packets with lengths of 1514, as seen in Figure 1. Exploring the first of these packets indicates just what we'd expect: an Ethernet MTU (maximum transmission unit) of 1500 bytes, and a TCP payload of 1460 bytes, leaving 40 bytes for our header (20 byte IP and 20 byte TCP).

Figure 2: First large packet
 Hovering your mouse over the TCP details in the UI will highlight all the TCP specific data, but you can take such actions a step further. First, let's filter down to just the large packets with tcp.payload.length == 1460.
Figure 3: Filtered packets
 With our view reduced we can do some down and dirty carving pretty easily with Dripcap. In each of the four filtered packets I hovered over Payload 1460 bytes as seen in Figure 4, which highlighted the payload-specific hex. I then used the mouse to capture the highlighted content and, using Dripcap's Edit and Copy, grabbed only that payload-specific hex and pasted it to a text file.
Figure 4: Hex payload
I did this with each of these four packets and copied content, one hex blob after the other, into my text file, in tight, seamless sequence. I then used Python Tools for Visual Studio to do a quick hexadecimal to ASCII translation as easily as bytearray.fromhex("my hex snippet here").decode(). The result, in Figure 5, shows the resulting JavaScript payload the exploits CVE-2010-0249.
Figure 5: ASCII results
You can just as easily use online converters as well. I saved the ASCII results to a text file in a directory which I had excluded from my anti-malware protection. After uploading the file to VirusTotal as payload.txt, my expectations were confirmed: 32 of 56 AV providers detected the file as the likes of Exploit:JS/Elecom.D or, more to the point, Exploit.JS.Aurora.a.

In closing
Perhaps not the most elegant method, but it worked quickly and easily with Dripcap's filtering and editing functions. I hope to see this tool, and its community, continue to grow. Build dissector packages, create themes, become part of the process, it's always good to see alternatives in available to security practitioners.
Cheers...until next time.

by Russ McRee (noreply@blogger.com) at March 27, 2017 04:49 AM

March 26, 2017

Evaggelos Balaskas

swapfile on centos7

Working with VPS (Virtual Private Server), sometimes means that you dont have a lot of memory.

That’s why, we use the swap partition, a system partition that our linux kernel use as extended memory. It’s slow but necessary when your system needs more memory. Even if you dont have any free partition disk, you can use a swap file to add to your linux system.

Create the Swap File


[root@centos7] # dd if=/dev/zero of=/swapfile count=1000 bs=1MiB
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 3.62295 s, 289 MB/s

[root@centos7] # du -sh /swapfile
1.0G    /swapfile

That is 1G file

Make Swap


[root@centos7] # mkswap -L swapfs /swapfile
Setting up swapspace version 1, size = 1048572 KiB
LABEL=swapfs, UUID=d8af8f19-5578-4c8e-b2b1-3ff57edb71f9

Permissions


[root@centos7] # chmod 0600 /swapfile

Activate


[root@centos7] # swapon /swapfile

Check

# free
              total        used        free      shared  buff/cache   available
Mem:        1883716     1613952       79172       54612      190592       64668
Swap:       1023996           0     1023996

fstab

Now for the final step, we need to edit /etc/fstab

/swapfile   swap    swap    defaults    0   0
Tag(s): swap, centos7

March 26, 2017 06:55 PM

March 23, 2017

Carl Chenet

Feed2tweet 1.0, tool to post RSS feeds to Twitter, released

Feed2tweet 1.0, a self-hosted Python app to automatically post RSS feeds to the Twitter social network, was released March 2017, 23th.

The main new feature of this release allows to create filters for each RSS feed, because before you could only define global filters. Contributed by Antoine Beaupré, Feed2tweet is also able to use syslog, starting from this release.

fiestaWhat’s the purpose of Feed2tweet?

Some online services offer to convert your RSS entries into Twitter posts. Theses services are usually not reliable, slow and don’t respect your privacy. Feed2tweet is Python self-hosted app, the source code is easy to read and you can enjoy the official documentation online with lots of examples.

Twitter Out Of The Browser

Have a look at my Github account for my other Twitter automation tools:

  • Retweet , retweet all (or using some filters) tweets from a Twitter account to another one to spread content.
  • db2twitter, get data from SQL database (several supported), build tweets and send them to Twitter
  • Twitterwatch, monitor the activity of your Twitter timeline and warn you if no new tweet appears

What about you? Do you use tools to automate the management of your Twitter account? Feel free to give me feedback in the comments below.

… and finally

You can help Feed2tweet by donating anything through Liberaypay (also possible with cryptocurrencies). That’s a big factor motivation 😉

 

by Carl Chenet at March 23, 2017 11:00 PM

TaoSecurity

Five Reasons I Want China Running Its Own Software

Periodically I read about efforts by China, or Russia, or North Korea, or other countries to replace American software with indigenous or semi-indigenous alternatives. I then reply via Twitter that I love the idea, with a short reason why. This post will list the top five reasons why I want China and other likely targets of American foreign intelligence collection to run their own software.

1. Many (most?) non-US software companies write lousy code. The US is by no means perfect, but our developers and processes generally appear to be superior to foreign indigenous efforts. Cisco vs Huawei is a good example. Cisco has plenty of problems, but it has processes in place to manage them, plus secure code development practices. Lousy indigenous code means it is easier for American intelligence agencies to penetrate foreign targets. (An example of a foreign country that excels in writing code is Israel, but thankfully it is not the same sort of priority target like China, Russia, or North Korea.)

2. Many (most?) non-US enterprises are 5-10 years behind US security practices. Even if a foreign target runs decent native code, the IT processes maintaining that code are lagging compared to American counterparts. Again, the US has not solved this problem by any stretch of the imagination. However, relatively speaking, American inventory management, patch management, and security operations have the edge over foreign intelligence targets. Because non-US enterprises running indigenous code will not necessarily be able to benefit from American expertise (as they might if they were running American code), these deficiencies will make them easier targets for foreign exploitation.

3. Foreign targets running foreign code is win-win for American intel and enterprises. The current vulnerability equities process (VEP) puts American intelligence agencies in a quandary. The IC develops a zero-day exploit for a vulnerability, say for use against Cisco routers. American and Chinese organizations use Cisco routers. Should the IC sit on the vulnerability in order to maintain access to foreign targets, or should it release the vulnerability to Cisco to enable patching and thereby protect American and foreign systems?

This dilemma disappears in a world where foreign targets run indigenous software. If the IC identifies a vulnerability in Cisco software, and the majority of its targets run non-Cisco software, then the IC is more likely (or should be pushed to be more likely) to assist with patching the vulnerable software. Meanwhile, the IC continues to exploit Huawei or other products at its leisure.

4. Writing and running indigenous code is the fastest way to improve. When foreign countries essentially outsource their IT to vendors, they become program managers. They lose or never develop any ability to write and run quality software. Writing and running your own code will enroll foreign organizations in the security school of hard knocks. American intel will have a field day for 3-5 years against these targets, as they flail around in a perpetual state of compromise. However, if they devote the proper native resources and attention, they will learn from their mistakes. They will write and run better software. Now, this means they will become harder targets for American intel, but American intel will retain the advantage of point 3.

5. Trustworthy indigenous code will promote international stability. Countries like China feel especially vulnerable to American exploitation. They have every reason to be scared. They run code written by other organizations. They don't patch it or manage it well. Their security operations stink. The American intel community could initiate a complete moratorium on hacking China, and the Chinese would still be ravaged by other countries or criminal hackers, all the while likely blaming American intel. They would not be able to assess the situation. This makes for a very unstable situation.

Therefore, countries like China and others are going down the indigenous software path. They understand that software, not oil as Daniel Yergen once wrote, is now the "commanding heights" of the economy. Pursuing this course will subject these countries to many years of pain. However, in the end I believe it will yield a more stable situation. These countries should begin to perceive that they are less vulnerable. They will experience their own vulnerability equity process. They will be more aware and less paranoid.

In this respect, indigenous software is a win for global politics. The losers, of course, are global software companies. Foreign countries will continue to make short-term deals to suck intellectual property and expertise from American software companies, before discarding them on the side of Al Gore's information highway.

One final point -- a way foreign companies could jump-start their indigenous efforts would be to leverage open source software. I doubt they would necessarily honor licenses which require sharing improvements with the open source community. However, open source would give foreign organizations the visibility they need and access to expertise that they lack. Microsoft's shared source and similar programs were a step in this direction, but I suggest foreign organizations adopt open source instead.

Now, widespread open source adoption by foreign intelligence targets would erode the advantages for American intel that I explained in point 3. I'm betting that foreign leaders are likely similar to Americans in that they tend to not trust open source, and prefer to roll their own and hold vendors accountable. Therefore I'm not that worried, from an American intel perspective, about point 3 being vastly eroded by widespread foreign open source adoption.

TeePublic is running a sale until midnight ET Thursday! Get a TaoSecurity Milnet T-shirt for yourself and a friend!


by Richard Bejtlich (noreply@blogger.com) at March 23, 2017 02:26 PM

The Missing Trends in M-Trends 2017

FireEye released the 2017 edition of the Mandiant M-Trends report yesterday. I've been a fan of this report since the 2010 edition, before I worked at the company.

Curiously for a report with the name "trends" in the title, this and all other editions do not publish the sorts of yearly trends I would expect. This post will address that limitation.

The report is most famous for its "dwell time" metric, which is the median (not average, or "mean") number of days an intruder spends inside a target company until he is discovered.

Each report lists the statistic for the year in consideration, and compares it to the previous year. For example, the 2017 report, covering incidents from 2016, notes the dwell time has dropped from 146 days in 2015, to 99 days in 2016.

The second most interesting metric (for me) is the split between internal and external notification. Internal notification means that the target organization found the intrusion on its own. External notification means that someone else informed the target organization. The external party is often a law enforcement or intelligence agency, or a managed security services provider. The 2016 split was 53% internal vs 47% external.

How do these numbers look over the years that the M-Trends report has been published? Inquiring minds want to know.

The 2012 M-Trends report was the first edition to include these statistics. I have included them for that report and all subsequent editions in the table below.

Year Days Internal External
2011 416 6 94
2012 243 37 63
2013 229 33 67
2014 205 31 69
2015 146 47 53
2016 99 53 47

As you can see, all of the numbers are heading in the right direction. We are finally into double digits for dwell time, but over 3 months is still far too long. Internal detection continues to rise as well. This is a proxy for the maturity of a security organization, in my opinion.

Hopefully future M-Trends reports will include tables like this.


by Richard Bejtlich (noreply@blogger.com) at March 23, 2017 12:43 PM

March 22, 2017

OpenSSL

Licensing Update

The following is a press release that we just released, with the cooperation and financial support of the Core Infrastructure Initiative and the Linux Foundation.

In the next few days we’ll start sending out email to all contributors asking them to approve the change. In the meantime, you can visit the licensing website and search for your name and request the email. If you have changed email addresses, or want to raise other issues about the license change, please email license@openssl.org. You can also post general issues to openssl-users@openssl.org.

We are grateful to all the contributors who have contributed to OpenSSL and look forward to their help and support in this effort.

The official press release can be found at the CII website. The rest of this post is a copy:

OpenSSL Re-licensing to Apache License v. 2.0 To Encourage Broader Use with Other FOSS Projects and Products

OpenSSL Launches New Website to Organize Process, Seeks to Contact All Contributors

SAN FRANCISCO, March 16, 2017 – The OpenSSL project, home of the world’s most popular SSL/TLS and cryptographic toolkit, is changing its license to the Apache License v 2.0 (ASL v2). As part of this effort, the OpenSSL team launched a new website and has been working with various corporate collaborators to facilitate the re-licensing process.

“This re-licensing activity will make OpenSSL, already the world’s most widely-used FOSS encryption software, more convenient to incorporate in the widest possible range of free and open source software,” said Mishi Choudhary, Legal Director of Software Freedom Law Center (SFLC) and counsel to OpenSSL. “OpenSSL’s team has carefully prepared for this re-licensing, and their process will be an outstanding example of ‘how to do it right.’ SFLC is pleased to have been able to help the team bring this process to this point, and looks forward to its successful and timely completion.”

The website will aid the OpenSSL team’s efforts to contact everyone who has contributed to the project so far, which includes nearly 400 individuals with a total of more than 31,000 commits. The current license dates back to the 1990’s and is more than 20 years old. The open source community has grown and changed since then, and has mostly settled on a small number of standard licenses.

After careful review, consultation with other projects, and input from the Core Infrastructure Initiative and legal counsel from the SFLC, the OpenSSL team decided to relicense the code under the widely-used ASLv2.

“The Linux Foundation is excited to see the OpenSSL project re-licensing under the Apache License,” said Nicko van Someren, Chief Technology Officer, the Linux Foundation. “Using a standard and well-understood license is a huge benefit when incorporating a FOSS project into other projects and products. OpenSSL has made huge progress in recent years, in part through support from the Linux Foundation’s Core Infrastructure Initiative, and this license move will further help to ensure it remains one of the most important and relied-upon open source projects in the world.”

The website contains a list of every email address mentioned in every single commit, a searchable database of authors, and the ability to send email and approve the license change. Because email addresses change, the website will also be updated over time to record email bounces and the names of people the project is still trying to reach.

“Oracle is proud to extend its collaboration with the OpenSSL Foundation by relicensing its contributions of elliptic curve cryptography,” said Jim Wright, Chief Architect of Open Source Policy, Strategy, Compliance and Alliances, Oracle. “OpenSSL is a critical component in both Oracle products and the infrastructure of the Internet, and we strongly believe the increased use of cryptography fostered by OpenSSL will benefit the entire enterprise software community.”

“Intel is thrilled to see OpenSSL moving to the standard Apache 2.0 license, improving license compatibility within the Open Source ecosystem,” said Imad Sousou, Vice President and General Manager of the Open Source Technology Center, Intel. “This will help defragment the open source cryptography ecosystem, leading to stronger and more pervasive use of crypto to improve privacy and security in the global technology infrastructure.”

Additional details on the decision to move to ASL v. 2.0 are available here. For progress updates on re-licensing, which is expected to take several months, check the website and project mailing lists.

To reach the OpenSSL team involved in this effort, email license@openssl.org. The team also asks that anyone who knows of other people that should be contacted, such as “silent collaborators” on code contributions, to also send email.

March 22, 2017 12:00 PM

March 21, 2017

R.I.Pienaar

Choria Network Protocols – Transport

The old MCollective protocols are now ancient and was quite Ruby slanted – full of Symbols and used YAML and quite language specific – in Choria I’d like to support other Programming Languages, REST gateways and so forth, so a rethink was needed.

I’ll look at the basic transport protocol used by the Choria NATS connector, usually it’s quite unusual to speak of Network Protocols when dealing with messages on a broker but really for MCollective it is exactly that – a Network Protocol.

The messages need enough information for strong AAA, they need to have an agreed on security structure and within them live things like RPC requests. So a formal specification is needed which is exactly what a Protocol is.

While creating Choria the entire protocol stack has been redesigned on every level except the core MCollective messages – Choria maintains a small compatibility layer to make things work. To really achieve my goal I’d need to downgrade MCollective to pure JSON data at which point multi language interop should be possible and easy.

Networks are Onions


Network protocols tend to come in layers, one protocol within another within another. The nearer you go to the transport the more generic it gets. This is true for HTTP within TCP within IP within Ethernet and likewise it’s true for MCollective.

Just like for TCP/IP and HTTP+FTP one MCollective network can carry many protocols like the RPC one, a typical MCollective install uses 2 protocols at this inner most layer. You can even make your own, the entire RPC system is a plugin!

( middleware protocol
  ( transport packet that travels over the middleware
      ( security plugin internal representation
        ( mcollective core representation that becomes M::Message
          ( MCollective Core Message )
          ( RPC Request, RPC Reply )
          ( Other Protocols, .... )
        )
      )
    )
  )
)

Here you can see when you do mco rpc puppet status you’ll be creating a RPC Request wrapped in a MCollective Message, wrapped in a structure the Security Plugin dictates, wrapped in a structure the Connector Plugin dictates and from there to your middleware like NATS.

Today I’ll look at the Transport Packet since that is where Network Federation lives which I spoke about yesterday.

Transport Layer


The Transport Layer packets are unauthenticated and unsigned, for MCollective security happens in the packet carried within the transport so this is fine. It’s not inconceivable that a Federation might only want to route signed messages and it’s quite easy to add later if needed.

Of course the NATS daemons will only accept TLS connections from certificates signed by the CA so these network packets are encrypted and access to the transport medium is restricted, but the JSON data you’ll see below is sent as is.

In all the messages shown below you’ll see a seen-by header, this is a feature of the NATS Connector Plugin that records the connected NATS broker, we’ll soon expose this information to MCollective API clients so we can make a traceroute tool for Federations. This header is optional and off by default though.

I’ll show messages in Ruby format here but it’s all JSON on the wire.

Message Targets


First it’s worth knowing where things are sent on the NATS clusters. The targets used by the NATS connector is pretty simple stuff, there will no doubt be scope for improvement once I look to support NATS Streaming but for now this is adequate.

  • Broadcast Request for agent puppet in the mycorp sub collective – mycorp.broadcast.agent.puppet
  • Directed Request to a node for any agent in the mycorp sub collective – mycorp.node.node1.example.net
  • Reply to a node identity dev1.example.net with pid 9999 and a message sequence of 10mycorp.reply.node1.example.net.9999.10

As the Federation Brokers are independent of Sub Collectives they are not prefixed with any collective specific token:

  • Requests from a Federation Client to a Federation Broker Cluster called productionchoria.federation.production.federation queue group production_federation
  • Replies from the Collective to a Federation Broker Cluster called productionchoria.federation.production.collective queue group production_collective
  • production cluster Federation Broker Instances publishes statistics – choria.federation.production.stats

These names are designed so that in smaller setups or in development you could use a single NATS cluster with Federation Brokers between standalone collectives. Not really a recommended thing but it helps in development.

Unfederated Messages


Your basic Unfederated Message is pretty simple:

{
  "data" => "... any text ...",
  "headers" => {
    "mc_sender" => "dev1.example.net",
    "seen-by" => ["dev1.example.net", "nats1.example.net"],
    "reply-to" => "mcollective.reply.dev1.example.net.999999.0",
  }
}
  • it’s is a discovery request within the sub collective mcollective and would be published to mcollective.broadcast.agent.discovery.
  • it is sent from a machine identifying as dev1.example.net
  • we know it’s traveled via a NATS broker called nats1.example.net.
  • responses to this message needs to travel via NATS using the target mcollective.reply.dev1.example.net.999999.0.

The data is completely unstructured as far as this message is concerned it just needs to be some text, so base64 encoded is common. All the transport care for is getting this data to its destination with metadata attached, it does not care what’s in the data.

The reply to this message is almost identical:

{
  "data" => "... any text ...",
  "headers" => {
    "mc_sender" => "dev2.example.net",
    "seen-by" => ["dev1.example.net", "nats1.example.net", "dev2.example.net", "nats2.example.net"],
  }
}

This reply will travel via mcollective.reply.dev1.example.net.999999.0, we know that the node dev2.example.net is connected to nats2.example.net.

We can create a full traceroute like output with this which would show dev1.example.net -> nats1.example.net -> nats2.example.net -> dev2.example.net

Federated Messages


Federation is possible because MCollective will just store whatever Headers are in the message and put them back on the way out in any new replies. Given this we can embed all the federation metadata and this metadata travels along with each individual message – so the Federation Brokers can be entirely stateless, all the needed state lives with the messages.

With Federation Brokers being clusters this means your message request might flow over a cluster member a but the reply can come via b – and if it’s a stream of replies they will be load balanced by the members. The Federation Broker Instances do not need something like Consul or shared store since all the data needed is in the messages.

Lets look at the same Request as earlier if the client was configured to belong to a Federation with a network called production as one of its members. It’s identical to before except the federation structure was added:

{
  "data" => "... any text ...",
  "headers" => {
    "mc_sender" => "dev1.example.net",
    "seen-by" => ["dev1.example.net", "nats1.fed.example.net"],
    "reply-to" => "mcollective.reply.dev1.example.net.999999.0",
    "federation" => {
       "req" => "68b329da9893e34099c7d8ad5cb9c940",
       "target" => ["mcollective.broadcast.agent.discovery"]
    }
  }
}
  • it’s is a discovery request within the sub collective mcollective and would be published via a Federation Broker Cluster called production via NATS choria.federation.production.federation.
  • it is sent from a machine identifying as dev1.example.net
  • it’s traveled via a NATS broker called nats1.fed.example.net.
  • responses to this message needs to travel via NATS using the target mcollective.reply.dev1.example.net.999999.0.
  • it’s federated and the client wants the Federation Broker to deliver it to it’s connected Member Collective on mcollective.broadcast.agent.discovery

The Federation Broker receives this and creates a new message that it publishes on it’s Member Collective:

{
  "data" => "... any text ...",
  "headers" => {
    "mc_sender" => "dev1.example.net",
    "seen-by" => [
      "dev1.example.net",
      "nats1.fed.example.net", 
      "nats2.fed.example.net", 
      "fedbroker_production_a",
      "nats1.prod.example.net"
    ],
    "reply-to" => "choria.federation.production.collective",
    "federation" => {
       "req" => "68b329da9893e34099c7d8ad5cb9c940",
       "reply-to" => "mcollective.reply.dev1.example.net.999999.0"
    }
  }
}

This is the same message as above, the Federation Broker recorded itself and it’s connected NATS server and produced a message, but in this message it intercepts the replies and tell the nodes to send them to choria.federation.production.collective and it records the original reply destination in the federation header.

A node that replies produce a reply, again this is very similar to the earlier reply except the federation header is coming back exactly as it was sent:

{
  "data" => "... any text ...",
  "headers" => {
    "mc_sender" => "dev2.example.net",
    "seen-by" => [
      "dev1.example.net",
      "nats1.fed.example.net", 
      "nats2.fed.example.net", 
      "fedbroker_production_a", 
      "nats1.prod.example.net",
      "dev2.example.net",
      "nats2.prod.example.net"
    ],
    "federation" => {
       "req" => "68b329da9893e34099c7d8ad5cb9c940",
       "reply-to" => "mcollective.reply.dev1.example.net.999999.0"
    }
  }
}

We know this node was connected to nats1.prod.example.net and you can see the Federation Broker would know how to publish this to the client – the reply-to is exactly what the Client initially requested, so it creates:

{
  "data" => "... any text ...",
  "headers" => {
    "mc_sender" => "dev2.example.net",
    "seen-by" => [
      "dev1.example.net",
      "nats1.fed.example.net", 
      "nats2.fed.example.net", 
      "fedbroker_production_a", 
      "nats1.prod.example.net",
      "dev2.example.net",
      "nats2.prod.example.net",
      "nats3.prod.example.net",
      "fedbroker_production_b",
      "nats3.fed.example.net"
    ],
  }
}

Which gets published to mcollective.reply.dev1.example.net.999999.0.

Route Records


You noticed above there’s a seen-by header, this is something entirely new and never before done in MCollective – and entirely optional and off by default. I anticipate you’d want to run with this off most of the time once your setup is done, it’s a debugging aid.

As NATS is a full mesh your message probably only goes one hop within the Mesh. So if you record the connected server you publish into and the connected server your message entered it’s destination from you have a full route recorded.

The Federation Broker logs and MCollective Client and Server logs all include the message ID so you can do a full trace in message packets and logs.

There’s a PR against MCollective to expose this header to the client code so I will add something like mco federation trace some.node.example.net which would send a round trip to that node and tell you exactly how the packet travelled. This should help a lot in debugging your setups as they will now become quite complex.

The structure here is kind of meh and I will probably improve on it once the PR in MCollective lands and I can see what is the minimum needed to do a full trace.

By default I’ll probably record the identities of the MCollective bits when Federated and not at all when not Federated. But if you enable the setting to record the full route it will produce a record of MCollective bits and the NATS nodes involved.

In the end though from the Federation example we can infer a network like this:

Federation NATS Cluster

  • Federation Broker production_a -> nats2.fed.example.net
  • Federation Broker production_b -> nats3.fed.example.net
  • Client dev1.example.net -> nats1.fed.example.net

Production NATS Cluster:

  • Federation Broker production_a -> nats1.prod.example.net
  • Federation Broker production_b -> nats3.prod.example.net
  • Server dev2.example.net -> nats2.prod.example.net

We don’t know the details of all the individual NATS nodes that makes up the entire NATS mesh but this is good enough.

Of course this sample is the pathological case where nothing is connected to the same NATS instances anywhere. In my tests with a setup like this the overhead added across 10 000 round trips against 3 nodes – so 30 000 replies through 2 x Federation Brokers – was only 2 seconds, I couldn’t reliably measure a per message overhead as it was just too small.

The NATS gem do expose the details of the full mesh though since NATS will announce it’s cluster members to clients, I might do something with that not sure. Either way, auto generated network maps should be totally possible.

Conclusion


So this is how Network Federation works in Choria. It’s particularly nice that I was able to do this without needing any state on the cluster thanks to past self making good design decisions in MCollective.

Once the seen-by thing is figured out I’ll publish JSON Schemas for these messages and declare protocol versions.

I can probably make future posts about the other message formats but they’re a bit nasty as MCollective itself is not yet JSON safe, the plan is it would become JSON safe one day and the whole thing will become a lot more elegant. If someone pings me for this I’ll post it otherwise I’ll probably stop here.

by R.I. Pienaar at March 21, 2017 10:15 PM

TaoSecurity

Cybersecurity Domains Mind Map

Last month I retweeted an image labelled "The Map of Cybersecurity Domains (v1.0)". I liked the way this graphic divided "security" into various specialties. At the time I did not do any research to identify the originator of the graphic.

Last night before my Brazilian Jiu-Jitsu class I heard some of the guys talking about certifications. They were all interested in "cybersecurity" but did not know how to break into the field. The domain image came to mind as I mentioned that I had some experience in the field. I also remembered an article Brian Krebs asked me to write titled "How to Break Into Security, Bejtlich Edition," part of a series on that theme. I wrote:

Providing advice on “getting started in digital security” is similar to providing advice on “getting started in medicine.” If you ask a neurosurgeon he or she may propose some sort of experiment with dead frog legs and batteries. If you ask a dermatologist you might get advice on protection from the sun whenever you go outside. Asking a “security person” will likewise result in many different responses, depending on the individual’s background and tastes.

I offered to help the guys in my BJJ class find the area of security that interests them and get started in that space. I thought the domains graphic might facilitate that conversation, so I decided to identify the originator so as to give proper credit.

It turns out that that CISO at Oppenheimer & Co, Henry Jiang, created the domains graphic. Last month at LinkedIn he published an updated Map of Cybersecurity Domains v2.0:

Map of Cybersecurity Domains v2.0 by Henry Jiang
If I could suggest a few changes for an updated version, I would try to put related disciplines closer to each other. For example, I would put the Threat Intelligence section right next to Security Operations. I would also swap the locations of Risk Assessment and Governance. Governance is closer to the Framework and Standard arena. I would also move User Education to be near Career Development, since both deal with people.

On a more substantive level, I am not comfortable with the Risk Assessment section. Blue Team and Red Team are not derivatives of a Penetration test, for example. I'm not sure how to rebuild that section.

These are minor issues overall. The main reason I like this graphic is that it largely captures the various disciplines one encounters in "cybersecurity." I could point a newcomer to the field at this image and ask "does any of this look interesting?" I could ask someone more experienced "in which areas have your worked?" or "in which areas would you like to work?"

The cropped image at the top of this blog shows the Security Operations and Threat Intelligence areas, where I have the most experience. Another security person could easily select a completely different section and still be considered a veteran. Our field is no longer defined by a small set of skills!

What do you think of this diagram? What changes would you make?

by Richard Bejtlich (noreply@blogger.com) at March 21, 2017 01:17 PM

March 20, 2017

R.I.Pienaar

Choria Network Federation

Running large or distributed MCollective networks have always been a pain. As much as Middleware is an enabler it starts actively working against you as you grow and as latency increases, this is felt especially when you have geographically distributed networks.

Federation has been discussed often in the past but nothing ever happened, NATS ended up forcing my hand because it only supports a full mesh mode. Something that would not be suitable for a globe spanning network.

Overview


I spent the last week or two building in Federation first into the Choria network protocol and later added a Federation Broker. Federation can be used to connect entirely separate collectives together into one from the perspective of a client.

Here we can see a distributed Federation of Collectives. Effectively London, Tokyo and New York are entirely standalone collectives. They are smaller, they have their own middleware infrastructure, they even function just like a normal collective and can have clients communicating with those isolated collectives like always.

I set up 5 node NATS meshes in every region. We then add a Federation Broker cluster that provide bridging services to a central Federation network. I’d suggest running the Federation Broker Cluster one instance on each of your NATS nodes, but you can run as many as you like.

Correctly configured Clients that connect to the central Federation network will interact with all the isolated collectives as if they are one. All current MCollective features keep working and Sub Collectives can span the entire Federation.

Impact


There are obvious advantages in large networks – instead of one giant 100 000 node middleware you now need to built 10 x 10 000 node networks, something that is a lot easier to do. With NATS, it’s more or less trivial.

Not so obvious is how this scales wrt MCollective. MCollective has a mode called Direct Addressing where the client would need to create 1 message for every node targeted in the request. Generally very large requests are discouraged so it works ok.

These requests being made on the client ends up having to travel individually all across the globe and this is where it starts to hurt.

With Federation the client will divide the task of producing these per client messages into groups of 200 and pass the request to the Federation Broker Cluster. The cluster will then, in a load shared fashion, do the work for the client.

Since the Federation Broker tends to be near the individual Collectives this yields a massive reduction in client work and network traffic. The Federation Broker Instances are entirely state free so you can run as many as you like and they will share the workload more or less evenly across them.

$ mco federation observe --cluster production
Federation Broker: production
 
Federation
  Totals:
    Received: 1024  Sent: 12288
 
  Instances:
    1: Received: 533 (52.1%) Sent: 6192 (50.4%)
    2: Received: 491 (47.9%) Sent: 6096 (49.6%)

Above you can see the client offloading the work onto a Federation Broker with 2 cluster members. The client sent 1024 messages but the broker sent 12288 messages on the clients behalf. The 2 instances does a reasonable job of sharing the load of creating and federating the messages across them.

In my tests against large collectives this speeds up the request significantly and greatly reduce the client load.

In the simple broadcast case there is no speed up, but when doing 10 000 requests in a loop the overhead of Federation was about 2 seconds over the 10 000 requests – so hardly noticeable.

Future Direction


The Choria protocol supports Federation in a way that is not tied to its specific Federation Broker implementation. The basic POC Federation Broker was around 200 lines so not really a great challenge to write.

I imagine in time we might see a few options here:

  • You can use different CAs in various places in your Federated network. The Federation Broker using Choria Security privileged certificates can provide user id mapping and rewriting between the Collectives
  • If you want to build a SaaS management services ontop of Choria a Federated network makes a really safe way to reach into managed networks without exposing the collectives to each other in any way. A client in one member Collective cannot use the Federation Brokers to access another Collective.
  • Custom RBAC and Auditing schemes can be built at the Federation Broker layer where the requests can be introspected and only ones matching policy are passed to the managed Collective
  • Federation is tailor made to provide Protocol translation. Different protocol Collectives can be bridged together. An older MCollective SSL based collective can be reached from a Choria collective via a Federation Broker providing translation capabilities. Ditto a Websocket interface to Collectives can be a Federation Broker listening on Websocket while speaking NATS on the other end.

The security implications are huge, isolated collectives with isolated CAs and unique user Auditing, Authorization and Authentication needs bridged together via a custom RBAC layer, one that is horizontally scalable and stateless is quite a big deal.

Protocol translation is equally massive, as I move towards looking at ways to fork MCollective, given the lack of cooperation from Puppet Inc, this gives me a very solid way forward to not throw away peoples investment in older MCollective while wishing to find a way to move forward.

Availability


This will be released in version 0.0.25 of the Choria module which should be sometime this week. I’ve published pre-release docs already. Expect it to be deployable with very little effort via Puppet, given a good DNS setup it needs almost no configuration at all.

I’ll make a follow up post that explores the network protocol that made this possible to build with zero stored state in the Federation Broker Instances – a major achievement in my book.

UPDATE: All the gory protocol details are in a follow up post Choria Network Protocols – Transport.

by R.I. Pienaar at March 20, 2017 11:05 AM

March 19, 2017

Errata Security

Pranksters gonna prank

So Alfa Bank (the bank whose DNS traffic link it to trump-email.com) is back in the news with this press release about how in the last month, hackers have spoofed traffic trying to make it look like there's a tie with Trump. In other words, Alfa claims these packets are trying to frame them for a tie with Trump now, and thus (by extension) it must've been a frame last October.

There is no conspiracy here: it's just merry pranksters doing pranks (as this CNN article quotes me).

Indeed, among the people pranking has been me (not the pranks mentioned by Alfa, but different pranks). I ran a scan sending packets from IP address to almost everyone one the Internet, and set the reverse lookup to "mail1.trumpemail.com".



Sadly, my ISP doesn't allow me to put hyphens in the name, so it's not "trump-email.com" as it should be in order to prank well.

Geeks gonna geek and pranksters gonna prank. I can imagine all sorts of other fun pranks somebody might do in order to stir the pot. Since the original news reports of the AlfaBank/trump-email.com connection last year, we have to assume any further data is tainted by goofballs like me goofing off.


By the way, in my particular case, there's a good lesson to be had here about the arbitrariness of IP addresses and names. There is no server located at my IP address of 209.216.230.75. No such machine exists. Instead, I run my scans from a nearby machine on the same network, and "spoof" that address with masscan:

$ masscan 0.0.0.0/0 -p80 --banners --spoof-ip 209.216.230.75

This sends a web request to every machine on the Internet from that IP address, despite no machine anywhere being configured with that IP address.

I point this out because people are confused by the meaning of an "IP address", or a "server", "domain", and "domain name". I can imagine the FBI looking into this and getting a FISA warrant for the server located at my IP address, and my ISP coming back and telling them that no such server exists, nor has a server existed at that IP address for many years.

In the case of last years story, there's little reason to believe IP spoofing was happening, but the conspiracy theory still breaks down for the same reason: the association between these concepts is not what you think it is. Listrak, the owner of the server at the center of the conspiracy, still reverse resolves the IP address 66.216.133.29 as "mail1.trump-email.com", either because they are lazy, or because they enjoy the lulz.


It's absurd thinking anything sent by the server is related to the Trump Orgainzation today, and it's equally plausible that nothing the server sent was related to Trump last year as well, especially since (as CNN reports), Trump had severed their ties with Cendyn (the marketing company that uses Listrak servers for email).





Also, as mentioned in a previous blog post, I set my home network's domain to be "moscow.alfaintra.net", which means that some of my DNS lookups at home are actually being sent to Alfa Bank. I should probably turn this off before the FBI comes knocking at my door.


by Robert Graham (noreply@blogger.com) at March 19, 2017 06:45 AM

Assert() in the hands of bad coders

Using assert() creates better code, as programmers double-check assumptions. But only if used correctly. Unfortunately, bad programmers tend to use them badly, making code worse than if no asserts were used at all. They are a nuanced concept that most programmers don't really understand.

We saw this recently with the crash of "Bitcoin Unlimited", a version of Bitcoin that allows more transactions. They used an assert() to check the validity of input, and when they received bad input, most of the nodes in the network crashed.

The Bitcoin Classic/Unlimited code is full of bad uses of assert. The following examples are all from the file main.cpp.



Example #1this line of code:

  1.     if (nPos >= coins->vout.size() || coins->vout[nPos].IsNull())
  2.         assert(false); 

This use of assert is silly. The code should look like this:

  1.     assert(nPos < coins->vout.size());
  2.     assert(!coins->vout[nPos].IsNull());

This is the least of their problems. It understandable that as code ages, and things are added/changed, that odd looking code like this appears. But still, it's an example of wrong thinking about asserts. Among the problems this would cause is that if asserts were ever turned off, you'd have to deal with dead code elimination warnings in static analyzers.


Example #2line of code:

  1.     assert(view.Flush());

The code within assert is supposed to only read values, not change values. In this example, the Flush function changes things. Normally, asserts are only compiled into debug versions of the code, and removed for release versions. However, doing so for Bitcoin will cause the program to behave incorrectly, as things like the Flush() function are no longer called. That's why they put at the top of this code, to inform people that debug must be left on.

  1. #if defined(NDEBUG)
  2. # error "Bitcoin cannot be compiled without assertions."
  3. #endif


Example #3: line of code:

  1.     CBlockIndex* pindexNew = new CBlockIndex(block);
  2.     assert(pindexNew);

The new operator never returns NULL, but throws its own exception instead. Not only is this a misconception about what new does, it's also a misconception about assert. The assert is supposed to check for bad code, not check errors.


Example #4: line of code

  1.     BlockMap::iterator mi = mapBlockIndex.find(inv.hash);
  2.     CBlock block;
  3.     const Consensus::Params& consensusParams = Params().GetConsensus();
  4.     if (!ReadBlockFromDisk(block, (*mi).second, consensusParams))
  5.         assert(!"cannot load block from disk");

This is the feature that crashed Bitcoin Unlimited, and would also crash main Bitcoin nodes that use the "XTHIN" feature. The problem comes from parsing input (inv.hash). If the parsed input is bad, then the block won't exist on the disk, and the assert will fail, and the program will crash.

Again, assert is for checking for bad code that leads to impossible conditions, not checking errors in input, or checking errors in system functions.


Conclusion

The above examples were taken from only one file in the Bitcoin Classic source code. They demonstrate the typically wrong ways bad programmers use asserts. It'd be a great example to show students of programming how not to write bad code.

More generally, though, it shows why there's a difference between 1x and 10x programmers. 1x programmers, like those writing Bitcoin code, make the typical mistake of treating assert() as error checking. The nuance of assert is lost on them.




Updated to reflect that I'm refering to the "Bitcoin Classic" source code, which isn't the "Bitcoin Core" source code. However, all the problems above appear to also be problems in the Bitcoin Core source code.


by Robert Graham (noreply@blogger.com) at March 19, 2017 06:41 AM

March 16, 2017

The Lone Sysadmin

vCenter 6.5b Resets Root Password Expiration Settings

I’m starting to update all my 6.x vCenters and vROPS, pending patches being released. You should be doing this, too, since they’re vulnerable to the Apache Struts 2 critical security holes. One thing I noted in my testing is that after patching the 6.5 appliances, their root password expiration settings go back to the defaults. […]

The post vCenter 6.5b Resets Root Password Expiration Settings appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at March 16, 2017 09:23 PM

March 13, 2017

Sean's IT Blog

My Windows 10 Template Build Process

I’ve been spending a lot of time working with Windows 10 in the lab lately, and one of the big struggles I’ve faced was building a solid template that I could reuse.  The reason I’ve had trouble with this is due to changes that Microsoft made in Windows 10 that essentially break the process that worked with previous versions of Windows.  The biggest changes include Modern UI applications and changes to how default applications are handled.

Modern UI apps are problematic for many reasons.  First, some of the original core Windows applications have been replaced by Modern UI applications, so while it is possible to remove them, you lose significant functionality that may not be replaced by 3rd Party applications.  In order to keep these applications up-to-date, the Windows Store needs to be available on the desktop.  That also means that the Store can’t be disabled unless you want to run outdated Modern UI applications.  Second, Modern UI apps tend to break Sysprep if any user profiles exist on the system outside of the built-in Administrator. 

Default applications are another tricky area.  In previous versions of Windows, a web browser or document viewer could set itself, or prompt the user to set it, as the default application for certain file types or URLs.  So if you installed Adobe Reader, it could set itself up as the default application for PDF programs.  This does not necessarily appear to be the case in Windows 10 – some applications that manage common file types have a system default that applies to all users.  This is mainly true for URLs and PDF files, and they default to Microsoft Edge.  While I can change this on a per-user basis, I may want to enforce certain corporate application standards within my image.

I’ve been spending a lot of time building Windows 10 desktop templates in my lab, and I’ve been looking at a lot of Windows 10 build guides.  All of the ones I’ve seen treat a Windows 10 build like a Windows 7 build with some minor changes, but none of them address the issues that I’ve experienced in my lab.

To get around issues with Modern UI applications, managing and/or cleaning up user accounts on the system before sysprepping my template, and dealing with application defaults, I decided to put together a different method for building my Windows 10 VDI image to address the issues I’ve faced and to reduce the number of manual steps that I have to take when creating and/or updating the template.  The main thing that I do differently is the use of Sysprep Audit Mode.  Audit mode allows an administrator to bypass the OOBE and log into the desktop as a local administrator with network access to customize the desktop environment, and the system remains in Audit Mode until Sysprep is run again.  While in Audit Mode, I cannot join the computer to the domain.  However, this is not a deal breaker as I can access my file shares without being joined to the domain.

When building this template, I don’t do a defrag or run the VMware OS Optimization tool as this template is the grandparent VM.  I will deploy my production parent VMs from this template and optimize them before deploying my instant clone or full clone pools.  I also don’t feel that defrag is needed with disposable VMs running on modern storage solutions.

My steps for building a Windows 10 virtual desktop template are:

1. Create the VM in vCenter.  Select the VMXNet3 network card as the network device and configure the VM to boot directly into the BIOS.  Also be sure to attach the Windows 10 ISO to the virtual machine.

2. Power the VM on.

3. Open the Console or use the remote console to access the VM.

4. Disable the Floppy drive and the Serial, Parallel, and Floppy Controllers in the BIOS.  Press F10 to save the settings and reboot.

5. Boot into the Windows Installation disk.

6. When you reach the first screen of the Windows installer, press Shift-F10 to open a command prompt.

7. Type diskpart to launch the disk partition utility.  We’ll use this to custom create our partition.  By default, the Windows installer creates a partition table that includes 100MB reserved space for Bitlocker.  Bitlocker isn’t supported in VDI, so we will manually create our partition.  The steps to do that are:

  • Type Select Disk 0
  • Type Create Partition Primary
  • Type Exit twice

8. Install Windows 10 using the default options.

9. When the system boots the Out-of-Box-Experience (Windows Welcome/Set up New Account), press Control-Shift-F3 to boot into Sysprep Audit Mode.

10. Install VMware Tools and reboot.

11.  Install the Horizon agent and reboot.  Note: You may need to connect to a network file share to access installers.  When doing so, sign in as Domain\User when prompted.  Do not join the system to the domain while in Audit Mode.

12. Install any applications and/or updates that you want to have in the template.  Reboot as often as required as the servers will boot into Audit Mode.

13.  Remove any Modern UI Apps that you don’t want to provision as part of the template.  I remove all except for Photos (haven’t found a good free alternative), Calculator (Windows 10 Calculator is actually pretty good), and Store (might need it depending on my use case/to keep the other two updated).  You actually need to deprovision it twice – once for the administrator account and once at the system level to remove the AppX Package.  The steps for this are:

  • Open PowerShell as an Administrator.
  • Run the following command to deprovision AppX Packages for all users: Get-AppXPackage  -allusers| Where {($_.name –notlike “*Photos*”) –and ($_.name –notlike “*Calculator*”) –and ($_.name –notlike “*Store*”)} | Remove-AppXPackage
  • Run the following command to uninstall unneeded AppX Packages: Get-AppXProvisionedPackage –online | Where {($_.name –notlike “*Photos*”) –and ($_.name –notlike “*Calculator*”) –and ($_.name –notlike “*Store*”)} | Remove-AppXProvisionedPackage –online

14.  Configure the application defaults for your administrator account.  This can be done in Settings –> System –> Default Apps.

15. Now we’re going to replace the default application associations.  Windows stores these in an XML file, and these associations are installed for each new user that logs into the system.  This file is called DefaultOEMAssociations.xml, and it is located in C:\Windows\System32.  The steps for this are:

  • Back up the C:\Windows\System32\DefaultOEMAssociations.xml file.
  • Open PowerShell as an Administrator
  • Run the following command to export your Administrator account default app associations:dism /online /Export-DefaultAppAssociations:”%userprofile%\Desktop\NewDefaultAppAssociations.xml”
  • Run the following command to import your new default app associations: dism /online /Import-DefaultAppAssociations:”%userprofile%\Desktop\NewDefaultAppAssociations.xml”

16. Reboot the system.

17. After the system reboots, the sysprep window will pop up.  Select “Out-of-Box Experience,” “Generalize,” and Shut Down.  Click OK to run sysprep.

18. After the VM shuts down, convert it to a template and deploy a test VM.  The VM should boot to the Out-of-Box-Experience.  You can also use a customization spec when deploying templates from the VM, and while it will boot to the Out-of-Box-Experience, the customization will still run.

So these are the basic steps that I do when building my Windows 10 template for VMware.  If you have any questions, please get my on Twitter at @seanpmassey.


by seanpmassey at March 13, 2017 04:53 PM

March 11, 2017

Steve Kemp's Blog

How I started programming

I've written parts of this story in the past, but never in one place and never in much detail. So why not now?

In 1982 my family moved house, so one morning I went to school and at lunch-time I had to walk home to a completely different house.

We moved sometime towards the end of the year, and ended up spending lots of money replacing the windows of the new place. For people in York I was born in Farrar Street, Y010 3BY, and we moved to a place on Thief Lane, YO1 3HS. Being named as it was I "ironically" stole at least two street-signs and hung them on my bedroom wall. I suspect my parents were disappointed.

Anyway the net result of this relocation, and the extra repairs meant that my sisters and I had a joint Christmas present that year, a ZX Spectrum 48k.

I tried to find pictures of what we received but unfortunately the web doesn't remember the precise bundle. All together though we received:

I know we also received Horace and the Spiders, and I have vague memories of some other things being included, including a Space Invaders clone. No doubt my parents bought them separately.

Highlights of my Spectrum-gaming memories include R-Type, Strider, and the various "Dizzy" games. Some of the latter I remember very fondly.

Unfortunately this Christmas was pretty underwhelming. We unpacked the machine, we cabled it up to the family TV-set - we only had the one, after all - and then proceeded to be very disappointed when nothing we did resulted in a successful game! It turns out our cassette-deck was not good enough. Being back in the 80s the shops were closed over Christmas, and my memory is that it was around January before we received a working tape-player/recorder, such that we could load games.

Happily the computer came with manuals. I read one, skipping words and terms I didn't understand. I then read the other, which was the spiral-bound orange book. It contained enough examples and decent wording that I learned to write code in BASIC. Not bad for an 11/12 year old.

Later I discovered that my local library contained "computer books". These were colourful books that promised "The Mystery of Silver Mounter", or "Write your own ADVENTURE PROGRAMS". But were largely dry books that contained nothing but multi-page listings of BASIC programs to type in. Often with adjustments that had to be made for your own computer-flavour (BASIC varying between different systems).

If you want to recapture the magic scroll to the foot of this Osbourne page and you can download them!

Later I taught myself Z80 Assembly Language, partly via the Spectrum manual and partly via such books as these two (which I still own 30ish years later):

  • Understanding your Spectrum, Basic & Machine Code Programming.
    • by Dr Ian Logan
  • An introduction to Z80 Machine Code.
    • R.A & J.W Penfold

Pretty much the only reason I continued down this path is because I wanted infinite/extra lives in the few games I owned. (Which were largely pirated via the schoolboy network of parents with cassette-copiers.)

Eventually I got some of my l33t POKES printed in magazines, and received free badges from the magazines of the day such as Your Sinclair & Sinclair User. For example I was "Hacker of the Month" in the Your Sinclair issue 67 , Page 32, apparently because I "asked so nicely in my letter".

Terrible scan is terrible:

Anyway that takes me from 1980ish to 1984. The only computer I ever touched was a Spectrum. Friends had other things, and there were Sega consoles, but I have no memories of them. Suffice it to say that later when I first saw a PC (complete with Hercules graphics, hard drives, and similar sourcery, running GEM IIRC) I was pleased that Intel assembly was "similar" to Z80 assembly - and now I know the reason why.

Some time in the future I might document how I got my first computer job. It is hillarious. As was my naivete.

March 11, 2017 10:00 PM

March 07, 2017

The Lone Sysadmin

How Not To Quit Your Job

I’ve thought a lot lately about Michael Thomas, a moron who caused criminal amounts of damage to his former employer in the process of quitting. From The Register[0]: As well as deleting ClickMotive’s backups and notification systems for network problems, he cut off people’s VPN access and “tinkered” with the Texas company’s email servers. He deleted […]

The post How Not To Quit Your Job appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at March 07, 2017 05:08 AM

March 05, 2017

Cryptography Engineering

Secure computing for journalists

This morning on Twitter, Buzzfeed editor Miriam Elder asks the following question:

No, this is not a stupid question. Actually it’s an extremely important question, and judging by some of the responses to this Tweet there are a lot of other people who are confused about the answer.

Since I couldn’t find a perfect layperson’s reference anywhere else, I’m going to devote this post to providing the world’s simplest explanation of why, in the threat model of your typical journalistyour desktop machine isn’t very safe. And specifically, why you’re safer using a modern mobile device — and particularly, an iOS device — than just about any other platform.

A brief caveat: I’m a cryptographer, not a software security researcher. However, I’ve spent the past several years interacting with folks like Charlie and Dan and Thomas. I’m pretty confident that they agree with this advice.

What’s wrong with my laptop/desktop machine?

Sadly, most of the problem is you.

If you’re like most journalists — and really, most professionals — you spend less than 100% of your time thinking about security. You need to get work done. When you’re procrastinating from work, you visit funny sites your friends link you to on Facebook. Then you check your email. If you’re a normal and productive user, you probably do a combination of all these things every few minutes, all of which culminates in your downloading some email attachment and (shudder) opening it in Word.

Now I’m not trying to shame you for this. It’s perfectly normal, and indeed it’s necessary if you want to get things done.  But in the parlance of security professionals, it also means you have a huge attack surface.

In English, this means that from the perspective of an attacker there are many different avenues to compromise your machine. Many of these aren’t even that sophisticated. Often it’s just a matter of catching you during an unguarded moment and convincing you to download an executable file or an infected Office document. A compromised machine means that every piece of software on that machine is also vulnerable.

If you don’t believe this works, head over to Google and search for “Remote Access Trojans”. There’s an entire commercial market for these products, each of which allows you to remotely control someone else’s computer. These off-the-shelf products aren’t very sophisticated: indeed, most require you to trick your victim into downloading and running some executable attachment. Sadly, this works on most people just fine. And this is just the retail stuff. Imagine what a modestly sophisticated attacker can do.

I do some of those things on my phone as well. Why is a phone better?

Classical (desktop and laptop) operating systems were designed primarily to support application developers. This means they offer a lot of power to your applications. An application like Microsoft Word can typically read and write all the files available to your account. If Word becomes compromised, this is usually enough to pwn you in practice. And in many cases, these applications have components with root (or Administrator) access, which makes them even more dangerous.

Modern phone operating systems like Android and iOS were built on a different principle. Rather than trusting apps with much power, each app runs in a “sandbox” that (mainly) limits it to accessing its own files. If the sandbox works, even a malicious application shouldn’t be able to reach out to touch other apps’ files or permanently modify your system. This approach — combined with other protections such as in-memory code signing, hardware secret storage and routine use of anti-exploitation measures — makes your system vastly harder to compromise.

Of course, sandboxing isn’t perfect. A compromised or malicious app can always access its own files. More sophisticated exploits can “break out” of the sandbox, typically by exploiting a vulnerability in the operating system. Such vulnerabilities are routinely discovered and occasionally exploited.

The defense to this is twofold: (1) first, run a modern, up-to-date OS that receives security patches quickly. And (2) avoid downloading malicious apps. Which brings me to the main point of this post.

Why use iOS?

The fact of the matter is that when it comes to addressing these remaining issues, Apple phone operating systems (on iPhones and iPads) simply have a better track record.

Since Apple is the only manufacturer of iOS devices, there is no “middleman” when it comes to monitoring for iOS issues and deploying iOS security updates. This means that the buck stops at Apple — rather than with some third-party equipment manufacturer. Indeed, Apple routinely patches its operating systems and pushes the patches to all supported users — sometimes within hours of learning of a vulnerability (something that is relatively rare at this point in any case).

Of course, to be fair: Google has also become fairly decent at supporting its own Android devices. However, to get assurance from this process you need to be running a relatively brand new device and it needs to be manufactured by Google. Otherwise you’re liable to be several days or weeks behind the time when a security issue is discovered and patched — if you ever get it. And Google still does not support all of the features Apple does, including in-memory code signing and strong file encryption.

Apple also seems to do a relatively decent job at curating its App Store, at least as compared to Google. And because those apps support a more modern base of phones, they tend to have access to better security features, whereas Android apps more routinely get caught doing dumb stuff for backwards compatibility reasons.

A password manager using the SEP.

Finally, every recent Apple device (starting with the iPhone 5S and up) also includes a specialized chip known as a “Secure Enclave Processor“. This hardened processor assists in securing the boot chain — ensuring that nobody can tamper with your operating system. It can also protect sensitive values like your passwords, ensuring that only a password or fingerprint can access them.

A few Android phones also offer similar features as well. However, it’s unclear how well these are implemented in contrast to Apple’s SEP. It’s not a bet I would choose to take.

So does using iOS mean I’m perfectly safe?

Of course not. Unfortunately, computer security today is about resisting attacks. We still don’t quite know how to prevent them altogether.

Indeed, well-funded attackers like governments are still capable of compromising your iOS device (and your Android, and your PC or Mac). Literally the only question is how much they’ll have to spend doing it.

Here’s one data point. Last year a human rights activist in the UAE was targeted via a powerful zero day exploit, likely by his government. However, he was careful. Instead of clicking the link he was sent, the activist sent it to the engineers at Citizenlab who reverse-engineered the exploit. The resulting 35-page technical report by Lookout Security and Citizenlab is a thing of terrifying beauty: it describes a chain of no less than three previously unpublished software exploits, which together would have led to the complete compromise of the victim’s iPhone.

But such compromises don’t come cheap. It’s easy to see this kind of attack costing a million dollars or more. This is probably orders of magnitude more than it would cost to compromise the typical desktop user. That’s important. Not perfect, but important.

You’re telling me I have to give up my desktop machine?

Not at all. Or rather, while I’d love to tell you that, I understand this may not be realistic for most users.

All I am telling you to do is to be thoughtful. If you’re working on something sensitive, consider moving the majority of that work (and communications) to a secure device until you’re ready to share it. This may be a bit of a hassle, but it doesn’t have to be your whole life. And since most of us already carry some sort of phone or tablet in addition to our regular work computer, hopefully this won’t require too much of a change in your life.

You can still use your normal computer just fine, as long as you’re aware of the relative risks. That’s all I’m trying to accomplish with this post.

In conclusion

I expect that many technical people will find this post objectionable, largely because they assume that with their expertise and care they can make a desktop operating system work perfectly safely. And maybe they can! But that’s not who this post is addressed to.

And of course, this post still only scratches the surface of the problem. There’s still the problem of selecting the right applications for secure messaging (e.g., Signal and WhatsApp) and finding a good secure application for notetaking and document collaboration and so on.

But hopefully this post at least starts the discussion.


by Matthew Green at March 05, 2017 06:38 PM

Michael Biven

Why We Use Buzzwords

WebOps, SecOps, NetOps, DevOps, DevSecOps, ChatOps, NoOps, DataOps. They’re not just buzzwords. They’re a sign of a lack of resources.

I believe one reason why people keep coming up with newer versions of *Ops is there isn’t enough time or people to own the priorities that are being prepended to it. Forgot about the acronyms and the buzzwords and ask yourself why a new *Ops keep coming up.

It’s not always marketing hype. When people are struggling to find the resources to address one or multiple concerns they’ll latch onto anything that might help.

When systems administrators started building and supporting services that ran web sites we started calling it WebOps to differentiate that type of work from the more traditional SA role.

When DevOps came about, it was a reaction from SAs trying to build things at a speed that matched the needs of the companies that employed them. The success of using the label DevOps to frame that work encouraged others to orientate themselves around it as a way to achieve that same level of success.

Security is an early example and I can remember seeing talks and posts related to using DevOps as a framework to meet their needs.

Data related work seemed to be mostly absent. It seems instead we got a number of services, consultants, and companies that focused around providing a better database or some kind of big data product.

Reflecting back there are two surprises. The first being an early trend of including nontechnical departments in the DevOps framework. Adding your marketing department was one I saw a couple of different posts and talks on.

The second is even with the Software Defined Networking providing a programmatic path for traditional network engineering in the DevOps framework it really hasn’t been a standout. Most of the tools available are usually tied to big expensive gear and this seems to have kept network engineering outside of DevOps. The difference being if you are using a cloud platform that provides SDN then DevOps can cover the networking side.

ChatOps is the other interesting one, because it’s the one focused on the least technical thing. The difficulty people have communicating with other parts of their company and with easily finding information to basic questions that frequently come up.

This got me thinking of breaking the different types of engineering and/or development work needed into three common groups. And the few roles that have either been removed from some companies, outsourced, or they’re understaffed.

The three common groups include software engineer for front-end, back-end, mobile, and tooling. Systems engineering to provide the guidance and support for the entire product lifecycle; including traditional SA and operations roles, SRE, and tooling. The third is data engineering which covers traditional DBA roles, analytics, and big data.

Then you have QA, network engineering, and information security which seem to either have gone away unless you’re at a company that’s big enough to provide a dedicated team or has a specific requirement for them.

For QA it seems we’ve moved it from being a role into a responsibility everyone is accountable to.

Network engineering and security are two areas that I’m starting to wonder if they’ll be dispersed across the three common groups just as QA has been for the most part. Maybe SDN would move control of the network to software and system engineers? There is an obvious benefit to hold everyone accountable for security as we’ve done with QA.

This wouldn’t mean there’s no need for network and security professionals anymore, but that you would probably only find them in certain circumstances.

Which brings me back to my original point. Why the need for all of these buzzwords? With the one exception every single version of*Ops is focused on the non-product feature related work. We’re seeing reductions in the number of roles focused on that type of work while at the same time its importance has increased.

I think it’s important that we’re honest with ourselves and consider that maybe the way we do product management and project planning hasn’t caught up with the changes in the non-product areas of engineering. If it had do you think we would have see all of these new buzzwords?

March 05, 2017 05:48 PM

Vincent Bernat

Netops with Emacs and Org mode

Org mode is a package for Emacs to “keep notes, maintain todo lists, planning projects and authoring documents”. It can execute embedded snippets of code and capture the output (through Babel). It’s an invaluable tool for documenting your infrastructure and your operations.

Here are three (relatively) short videos exhibiting Org mode use in the context of network operations. In all of them, I am using my own junos-mode which features the following perks:

  • syntax highlighting for configuration files,
  • commit of configuration snippets to remote devices, and
  • execution of remote commands.

Since some Junos devices can be quite slow, commits and remote executions are done asynchronously1 with the help of a Python helper.

In the first video, I take some notes about configuring BGP add-path feature (RFC 7911). It demonstrates all the available features of junos-mode.

In the second video, I execute a planned operation to enable this feature in production. The document is a modus operandi and contains the configuration to apply and the commands to check if it works as expected. At the end, the document becomes a detailed report of the operation.

In the third video, a cookbook has been prepared to execute some changes. I set some variables and execute the cookbook to apply the change and check the result.


  1. This is a bit of a hack since Babel doesn’t have native support for that. Also have a look at ob-async which is a language-independent implementation of the same idea. 

by Vincent Bernat at March 05, 2017 11:01 AM

March 04, 2017

Anton Chuvakin - Security Warrior

March 01, 2017

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – Feburary 2016

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts/topics this month:
  1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010, so I have no idea why it tops the charts now!) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on developing security monitoring use cases here!
  2. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009. Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 
  3. Simple Log Review Checklist Released!” is often at the top of this list – this aging checklist is still a very useful tool for many people. “On Free Log Management Tools” (also aged a bit by now) is a companion to the checklist (updated version)
  4. This month, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book (now in its 4th edition!) – note that this series is mentioned in some PCI Council materials.
  5. “SIEM Resourcing or How Much the Friggin’ Thing Would REALLY Cost Me?” is a quick framework for assessing the SIEM project (well, a program, really) costs at an organization (a lot more details on this here in this paper).
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has about 5X of the traffic of this blog]: 
 
Recent research on security analytics and UBA / UEBA:
 
Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016.

Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Previous post in this endless series:

by Anton Chuvakin (anton@chuvakin.org) at March 01, 2017 04:15 PM

LZone - Sysadmin

Match Structured Facts in MCollective

If you are using Facter 2+, which is what you do when you run at least Puppet4, then you have structured facts (meaning nested values) like those:
processors => {
  count => 2,
  isa => "unknown",
  models => [
    "QEMU Virtual CPU version 2.1.2",
    "QEMU Virtual CPU version 2.1.2"
  ],
  physicalcount => 2
}
Now you cannot match those using
mco find -F <fact name>=<fact value>
If you try you just get an empty result. The only way to match structured facts is using -S
mco find -S 'fact("<fact name>").value=<value>'
For example:
mco find -S 'fact("networking.network").value=192.168.5.0'
mco find -S 'fact("os.distro.codename").value=jessie'
See also Mcollective Cheat Sheet

March 01, 2017 03:41 PM

Electricmonk.nl

HTTP Error 429 on Reddit

Getting HTTP error 429 when trying to call Reddit APIs or .json endpoints? Try changing your User Agent header to something else. Reddit bans based on user agent.

by admin at March 01, 2017 09:31 AM

February 28, 2017

Cryptography Engineering

The future of Ransomware

This is kind of a funny post for me to write, since it involves speculating about a very destructive type of software — and possibly offering some (very impractical) suggestions on how it might be improved in the future. It goes without saying that there are some real downsides to this kind of speculation. Nonetheless, I’m going ahead on the theory that it’s usually better to talk and think about the bad things that might happen to you — before you meet them on the street and they steal your lunch money.

On the other hand, just as there’s a part of every karate master that secretly wants to go out and beat up a bar full of people, there’s a part of every security professional that looks at our current generation of attackers and thinks: why can’t you people just be a bit more imaginative?! And wonders whether, if our attackers were just a little more creative, people would actually pay attention to securing their system before the bad stuff happens.

And ransomware is definitely a bad thing. According to the FBI it sucks up $1 billion/year in payments alone, and some unimaginably larger amount in remediation costs. This despite the fact that many ransomware packages truly suck, and individual ransomware developers get routinely pwned due to making stupid cryptographic errors. If this strategy is working so well today, the question  we should be asking ourselves is: how much worse could it get?

So that’s what I’m going to muse about now. A few (cryptographic) ways that it might.

Some of these ideas are the result of collaboration with my students Ian Miers, Gabe Kaptchuk and Christina Garman. They range from the obvious to the foolish to the whimsical, and I would be utterly amazed if any of them really do happen. So please don’t take this post too seriously. It’s all just fun.

Quick background: ransomware today

The amazing thing about ransomware is that something so simple could turn out to be such a problem. Modern ransomware consists of malware that infects your computer and then goes about doing something nasty: it encrypts every file it can get its hands on. This typically includes local files as well as network shares that can be reached from the infected machine.

Once your data has been encrypted, your options aren’t great. If you’re lucky enough to have a recent backup, you can purge the infected machine and restore. Otherwise you’re faced with a devil’s bargain: learn top live without that data, or pay the bastards.

If you choose to pay up, there are all sorts of different procedures. However most break down into the following three steps:

  1. When the ransomware encrypts your files, it generates a secret key file and stores it on your computer.
  2. You upload that file (or data string) to your attackers along with a Bitcoin payment.
  3. They process the result with their secrets and send you a decryption key.

If you’re lucky, and your attackers are still paying attention (or haven’t screwed up the crypto beyond recognition) you get back a decryption key or a tool you can use to undo the encryption on your files. The whole thing is very businesslike. Indeed, recent platforms will allegedly offer you a discount if you infect recommend it to your friends — just like Lyft!

The problem of course, is that nothing in this process guarantees that your attacker will give you that decryption key. They might be scammers. They might not have the secret anymore. They might get tracked down and arrested. Or they might get nervous and bail, taking your precious data and your payment with them. This uncertainty makes ransomware payments inherently risky — and worse, it’s the victims who mostly suffer for it.

Perhaps it would be nice if we could make that work better.

Verifiable key delivery using smart contracts

Most modern ransomware employs a cryptocurrency like Bitcoin to enable the payments that make the ransom possible. This is perhaps not the strongest argument for systems like Bitcoin — and yet it seems unlikely that Bitcoin is going away anytime soon. If we can’t solve the problem of Bitcoin, maybe it’s possible to use Bitcoin to make “more reliable” ransomware.

Recall that following a ransomware infection, there’s a possibility that you’ll pay the ransom and get nothing in return. Fundamentally there’s very little you can do about this. A conscientious ransomware developer might in theory offer a “proof of life” — that is, offer to decrypt a few files at random in order to prove their bonafides. But even if they bother with all the risk and interaction of doing this, there’s still no guarantee that they’ll bother to deliver the hostage alive.

An obvious approach to this problem is to make ransomware payments conditional. Rather than sending off your payment and hoping for the best, victims could use cryptocurrency features to ensure that ransomware operators can’t get paid unless they deliver a key. Specifically, a ransomware developer could easily perform payment via a smart contract script (in a system like Ethereum) that guarantees the following property:

This payment will be delivered to the ransomware operator if and only if the ransomware author unlocks it — by posting the ransomware decryption key to the same blockchain.

The basic primitive needed for this is called a Zero Knowledge Contingent Payment. This idea was proposed by Greg Maxwell and demonstrated by Sean Bowe of the ZCash team.**** The rough idea is to set the decryption key to be some pre-image k for some public hash value K that the ransomware generates and leaves on your system. It’s relatively easy to imagine a smart contract that allows payment if and only if the payee can post the input k such that K=SHA256(k). This could easily be written in Ethereum, and almost certainly has an analog for Bitcoin script.

The challenge here, of course, is to prove that k is actually a decryption key for your files, and that the files contain valid data. There are a handful of different ways to tackle this problem. One is to use complex zero-knowledge proof techniques (like zkSNARKs or ZKBoo) to make the necessary proofs non-interactively. But this is painful, and frankly above the level of most ransomware developers — who are still struggling with basic RSA.

An alternative approach is to use several such K challenges in combination with the “proof of life” idea. The ransomware operator would prove her bonafides by decrypting a small, randomly selected subset of files before the issuer issued payment. The operator could still “fake” the encryption — or lose the decryption key — but she would be exposed with reasonable probability before money changed hands.

“Autonomous” ransomware

Of course, the problem with “verifiable” ransomware is: what ransomware developer would bother with this nonsense?

While the ability to verify decryption might conceivably improve customer satisfaction, it’s not clear that it would really offer that much value to ransomware deverlopers. At the same time, it would definitely add a lot of nasty complexity to their software.

Instead of pursuing ideas that offer developers no obvious upside, ransomware designers presumably will pursue ideas that offer them some real benefits. And that brings us to an idea time whose time has (hopefully) not quite come yet. The idea itself is simple:

Make ransomware that doesn’t require operators.

Recall that in the final step of the ransom process, the ransomware operator must deliver a decryption key to the victim. This step is the most fraught for operators, since it requires them to manage keys and respond to queries on the Internet. Wouldn’t it be better for operators if they could eliminate this step altogether?

Of course, to accomplish this seems to require a trustworthy third party — or better, a form of ransomware that can decrypt itself when the victim makes a Bitcoin payment. Of course this last idea seems fundamentally contradictory. The decryption keys would have to live on the victim’s device, and the victim owns that device. If you tried that, then victim could presumably just hack the secrets out and decrypt the ransomware without paying.

But what if the victim couldn’t hack their own machine?

This isn’t a crazy idea. In fact, it’s exactly the premise that’s envisioned by a new class of trusted execution environments, including Intel’s SGX and ARM TrustZone. These systems — which are built into the latest generation of many processors — allow users to instantiate “secure enclaves”: software environments that can’t be accessed by outside parties. SGX also isolates enclaves from other enclaves, which means the secrets they hold are hard to pry out.

Hypothetically, after infecting your computer a piece of ransomware could generate and store its decryption key inside of a secure enclave. This enclave could be programmed to release the key only on presentation of a valid Bitcoin payment to a designated address.

The beauty of this approach is that no third party even needs to verify the payment. Bitcoin payments themselves consist of a publicly-verifiable transaction embedded in a series of “blocks”, each containing an expensive computational “proof of work“. In principle, after paying the ransom the victim could present the SGX enclave with a fragment of a blockchain all by itself — freeing the ransomware of the need to interact with third parties. If the blockchain fragment exhibited sufficient hashpower along with a valid payment to a specific address, the enclave would release the decryption key.*

The good news is that Intel and ARM have devoted serious resources to preventing this sort of unauthorized access. SGX developers must obtain a code signing certificate from Intel before they can make production-ready SGX enclaves, and it seems unlikely that Intel would partner up with a ransomware operation. Thus a ransomware operator would likely have to (1) steal a signing key from a legitimate Intel-certified developer, or (2) find an exploitable vulnerability in another developer’s enclave.**, ***

This all seems sort of unlikely, and that appears to block most of the threat — for now. Assuming companies like Intel and Qualcomm don’t screw things up, and have a good plan for revoking enclaves (uh oh), this is not very likely to be a big threat.

Of course, in the long run developers might not need Intel SGX at all. An even more speculative concern is that developments in the field of cryptographic obfuscation will provide a software-only alternative means to implement this type of ransomware. This would eliminate the need for a dependency like SGX altogether, allowing the ransomware to do its work with no hardware at all.

At present such techniques are far north of practical, keep getting broken, and might not work at all. But cryptographic researchers keep trying! I guess the lesson is that it’s not all roses if they succeed.

Ransomware Skynet

Since I’m already this far into what reads like a Peyote-fueled rant, let’s see if we can stretch the bounds of credibility just a little a bit farther. If ransomware can become partially autonomous — i.e., do part of its job without the need for human masters — what would it mean for it to become fully autonomous? In other words, what if we got rid of the rest of the human equation?

I come from the future to encrypt C:\Documents

Ransomware with the ability to enforce payments would provide a potent funding source for another type of autonomous agent: a Decentralized Autonomous Organization, or (DAO). These systems are “corporations” that consist entirely of code that runs on a consensus network like Ethereum. They’re driven by rules, and are capable of both receiving and transmitting funds without (direct) instruction from human beings.

At least in theory it might be possible to develop a DAO that’s funded entirely by ransomware payments — and in turn mindlessly contracts real human beings to develop better ransomware, deploy it against human targets, and… rinse repeat. It’s unlikely that such a system would be stable in the long run — humans are clever and good at destroying dumb things — but it might get a good run. Who knows? Maybe this is how the Rampant Orphan Botnet Ecologies get started.

(I hope it goes without saying that I’m mostly not being serious about this part. Even though it would be totally awesome in a horrible sort of way.)

In conclusion

This hasn’t been a terribly serious post, although it was fun to write. The truth is that as a defender, watching your attackers fiddle around is pretty much the most depressing thing ever. Sometimes you have to break the monotony a bit.

But insofar as there is a serious core to this post, it’s that ransomware currently is using only a tiny fraction of the capabilities available to it. Secure execution technologies in particular represent a giant footgun just waiting to go off if manufacturers get things only a little bit wrong.

Hopefully they won’t, no matter how entertaining it might be.

Notes:

* This technique is similar to SPV verification. Of course, it would also be possible for a victim to “forge” a blockchain fragment without paying the ransom. However, the cost of this could easily be tuned to significantly exceed the cost of paying the ransom. There are also many issues I’m glossing over here like difficulty adjustments and the possibility of amortizing the forgery over many different victims. But thinking about that stuff is a drag, and this is all for fun, right?

** Of course, if malware can exploit such a vulnerability in another developer’s enclave to achieve code execution for “ransomware”, then the victim could presumably exploit the same vulnerability to make the ransomware spit out its key without a payment. So this strategy seems self-limiting — unless the ransomware developers find a bug that can be “repaired” by changing some immutable state held by the enclave. That seems like a long shot. And no, SGX does not allow you to “seal” data to the current state of the enclave’s RAM image.

*** In theory, Intel or an ARM manufacturer could also revoke the enclave’s signing certificate. However, the current SGX specification doesn’t explain how such a revocation strategy should work. I assume this will be more prominent in future specifications.

**** The original version of this post didn’t credit Greg and Sean properly, because I honestly didn’t make the connection that I was describing the right primitive. Neat!


by Matthew Green at February 28, 2017 12:47 AM

February 27, 2017

LZone - Sysadmin

Nagios Check for Systemd Failed Units

Just a short bash script to check for faulty systemd units to avoid 107 lines of Python...
#!/bin/bash

if [ -f /bin/systemctl ]; then failed=$(/bin/systemctl --failed --no-legend) failed=${failed/ */} # Strip everything after first space failed=${failed/.service/} # Strip .service suffix

if [ "$failed" != "" ]; then echo "Failed units: $failed" exit 1 else echo "No failed units." exit 0 fi else echo "No systemd. Nothing was checked!" exit 0 fi

February 27, 2017 05:50 PM

February 25, 2017

Evaggelos Balaskas

Docker Swarm a native clustering system

Docker Swarm

The native Docker Container Orchestration system is Docker Swarm that in simple terms means that you can have multiple docker machines (hosts) to run your multiple docker containers (replicas). It is best to work with Docker Engine v1.12 and above as docker engine includes docker swarm natively.

Docker Swarm logo:
docker-swarm.png

In not so simply terms, docker instances (engines) running on multiple machines (nodes), communicating together (VXLAN) as a cluster (swarm).

Nodes

To begin with, we need to create our docker machines. One of the nodes must be the manager and the others will run as workers. For testing purposes I will run three (3) docker engines:

  • Manager Docker Node: myengine0
  • Worker Docker Node 1: myengine1
  • Worker Docker Node 2: myengine2

Drivers

A docker node is actually a machine that runs the docker engine in the swarm mode. The machine can be a physical, virtual, a virtualbox, a cloud instance, a VPS, a AWS etc etc

As the time of this blog post, officially docker supports natively the below drivers:

  • Amazon Web Services
  • Microsoft Azure
  • Digital Ocean
  • Exoscale
  • Google Compute Engine
  • Generic
  • Microsoft Hyper-V
  • OpenStack
  • Rackspace
  • IBM Softlayer
  • Oracle VirtualBox
  • VMware vCloud Air
  • VMware Fusion
  • VMware vSphere

QEMU - KVM

but there are unofficial drivers also.

I will use the qemu - kvm driver from this github repository: https://github.com/dhiltgen/docker-machine-kvm

The simplest way to add the kvm driver is this:


> cd /usr/local/bin/
> sudo -s
# wget -c https://github.com/dhiltgen/docker-machine-kvm/releases/download/v0.7.0/docker-machine-driver-kvm
# chmod 0750 docker-machine-driver-kvm

Docker Machines

The next thing we need to do, is to create our docker machines. Look on your distro’s repositories:

# yes | pacman -S docker-machine

Manager


$ docker-machine create -d kvm myengine0

Running pre-create checks...
Creating machine...
(myengine0) Image cache directory does not exist, creating it at /home/ebal/.docker/machine/cache...
(myengine0) No default Boot2Docker ISO found locally, downloading the latest release...
(myengine0) Latest release for github.com/boot2docker/boot2docker is v1.13.1
(myengine0) Downloading /home/ebal/.docker/machine/cache/boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v1.13.1/boot2docker.iso...
(myengine0) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
(myengine0) Copying /home/ebal/.docker/machine/cache/boot2docker.iso to /home/ebal/.docker/machine/machines/myengine0/boot2docker.iso...

Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with boot2docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env myengine0

Worker 1


$ docker-machine create -d kvm myengine1
Running pre-create checks...
Creating machine...
(myengine1) Copying /home/ebal/.docker/machine/cache/boot2docker.iso to /home/ebal/.docker/machine/machines/myengine1/boot2docker.iso...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with boot2docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env myengine1

Worker 2

$ docker-machine create -d kvm myengine2
Running pre-create checks...
Creating machine...
(myengine2) Copying /home/ebal/.docker/machine/cache/boot2docker.iso to /home/ebal/.docker/machine/machines/myengine2/boot2docker.iso...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with boot2docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env myengine2

List your Machines


$ docker-machine env myengine0
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://192.168.42.126:2376"
export DOCKER_CERT_PATH="/home/ebal/.docker/machine/machines/myengine0"
export DOCKER_MACHINE_NAME="myengine0"
# Run this command to configure your shell:
# eval $(docker-machine env myengine0)

$ docker-machine ls

NAME        ACTIVE   DRIVER   STATE     URL                         SWARM   DOCKER    ERRORS
myengine0   -        kvm      Running   tcp://192.168.42.126:2376           v1.13.1
myengine1   -        kvm      Running   tcp://192.168.42.51:2376            v1.13.1
myengine2   -        kvm      Running   tcp://192.168.42.251:2376           v1.13.1

Inspect

You can get the IP of your machines with:


$ docker-machine ip myengine0
192.168.42.126

$ docker-machine ip myengine1
192.168.42.51

$ docker-machine ip myengine2
192.168.42.251

with ls as seen above or use the inspect parameter for a full list of information regarding your machines in a json format:


$ docker-machine inspect myengine0

If you have jq you can filter out some info


$ docker-machine inspect myengine0  | jq .'Driver.DiskPath'

"/home/ebal/.docker/machine/machines/myengine0/myengine0.img"

SSH

To enter inside the kvm docker machine, you can use ssh

Manager


$ docker-machine ssh myengine0 

                        ##         .
                  ## ## ##        ==
               ## ## ## ## ##    ===
           /"""""""""""""""""___/ ===
      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
           ______ o           __/
                          __/
              ___________/
 _                 _   ____     _            _
| |__   ___   ___ | |_|___  __| | ___   ___| | _____ _ __
| '_  / _  / _ | __| __) / _` |/ _  / __| |/ / _  '__|
| |_) | (_) | (_) | |_ / __/ (_| | (_) | (__|   <  __/ |
|_.__/ ___/ ___/ __|_______,_|___/ ___|_|____|_|
Boot2Docker version 1.13.1, build HEAD : b7f6033 - Wed Feb  8 20:31:48 UTC 2017
Docker version 1.13.1, build 092cba3

Worker 1


$ docker-machine ssh myengine1 

                        ##         .
                  ## ## ##        ==
               ## ## ## ## ##    ===
           /"""""""""""""""""___/ ===
      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
           ______ o           __/
                          __/
              ___________/
 _                 _   ____     _            _
| |__   ___   ___ | |_|___  __| | ___   ___| | _____ _ __
| '_  / _  / _ | __| __) / _` |/ _  / __| |/ / _  '__|
| |_) | (_) | (_) | |_ / __/ (_| | (_) | (__|   <  __/ |
|_.__/ ___/ ___/ __|_______,_|___/ ___|_|____|_|
Boot2Docker version 1.13.1, build HEAD : b7f6033 - Wed Feb  8 20:31:48 UTC 2017
Docker version 1.13.1, build 092cba3

Worker 2


$ docker-machine ssh myengine2

                        ##         .
                  ## ## ##        ==
               ## ## ## ## ##    ===
           /"""""""""""""""""___/ ===
      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
           ______ o           __/
                          __/
              ___________/
 _                 _   ____     _            _
| |__   ___   ___ | |_|___  __| | ___   ___| | _____ _ __
| '_  / _  / _ | __| __) / _` |/ _  / __| |/ / _  '__|
| |_) | (_) | (_) | |_ / __/ (_| | (_) | (__|   <  __/ |
|_.__/ ___/ ___/ __|_______,_|___/ ___|_|____|_|
Boot2Docker version 1.13.1, build HEAD : b7f6033 - Wed Feb  8 20:31:48 UTC 2017
Docker version 1.13.1, build 092cba3

Swarm Cluster

Now it’s time to build a swarm of docker machines!

Initialize the manager

docker@myengine0:~$  docker swarm init --advertise-addr 192.168.42.126

Swarm initialized: current node (jwyrvepkz29ogpcx18lgs8qhx) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join
    --token SWMTKN-1-4vpiktzp68omwayfs4c3j5mrdrsdavwnewx5834g9cp6p1koeo-bgcwtrz6srt45qdxswnneb6i9
    192.168.42.126:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

Join Worker 1

docker@myengine1:~$  docker swarm join
>     --token SWMTKN-1-4vpiktzp68omwayfs4c3j5mrdrsdavwnewx5834g9cp6p1koeo-bgcwtrz6srt45qdxswnneb6i9
>     192.168.42.126:2377

This node joined a swarm as a worker.

Join Worker 2

docker@myengine2:~$   docker swarm join
>     --token SWMTKN-1-4vpiktzp68omwayfs4c3j5mrdrsdavwnewx5834g9cp6p1koeo-bgcwtrz6srt45qdxswnneb6i9
>     192.168.42.126:2377

This node joined a swarm as a worker.

From the manager


docker@myengine0:~$  docker node ls

ID                           HOSTNAME   STATUS  AVAILABILITY  MANAGER STATUS
jwyrvepkz29ogpcx18lgs8qhx *  myengine0  Ready   Active        Leader
m5akhw7j60fru2d0an4lnsgr3    myengine2  Ready   Active
sfau3r42bqbhtz1c6v9hnld67    myengine1  Ready   Active

Info

We can find more information about the docker-machines running the docker info command when you have ssh-ed the nodes:

eg. the swarm part:

manager


Swarm: active
 NodeID: jwyrvepkz29ogpcx18lgs8qhx
 Is Manager: true
 ClusterID: 8fjv5fzp0wtq9hibl7w2v65cs
 Managers: 1
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 192.168.42.126
 Manager Addresses:
  192.168.42.126:2377

worker1

Swarm: active
 NodeID: sfau3r42bqbhtz1c6v9hnld67
 Is Manager: false
 Node Address: 192.168.42.51
 Manager Addresses:
  192.168.42.126:2377

worker 2

Swarm: active
 NodeID: m5akhw7j60fru2d0an4lnsgr3
 Is Manager: false
 Node Address: 192.168.42.251
 Manager Addresses:
  192.168.42.126:2377

Services

Now it’s time to test our docker swarm by running a container service across our entire fleet!

For testing purposes we chose 6 replicas of an nginx container:


docker@myengine0:~$ docker service create --replicas 6 -p 80:80 --name web nginx

ql6iogo587ibji7e154m7npal

List images

docker@myengine0:~$  docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
nginx               <none>              db079554b4d2        9 days ago          182 MB

List of services

regarding your docker registry or your internet connection, we will see the replicas running:


docker@myengine0:~$ docker service ls
ID            NAME  MODE        REPLICAS  IMAGE
ql6iogo587ib  web   replicated  0/6       nginx:latest

docker@myengine0:~$ docker service ls
ID            NAME  MODE        REPLICAS  IMAGE
ql6iogo587ib  web   replicated  2/6       nginx:latest

docker@myengine0:~$ docker service ls
ID            NAME  MODE        REPLICAS  IMAGE
ql6iogo587ib  web   replicated  3/6       nginx:latest

docker@myengine0:~$ docker service ls
ID            NAME  MODE        REPLICAS  IMAGE
ql6iogo587ib  web   replicated  6/6       nginx:latest
docker@myengine0:~$  docker service ps web

ID            NAME   IMAGE         NODE       DESIRED STATE  CURRENT STATE           ERROR  PORTS
t3v855enecgv  web.1  nginx:latest  myengine1  Running        Running 17 minutes ago
xgwi91plvq00  web.2  nginx:latest  myengine2  Running        Running 17 minutes ago
0l6h6a0va2fy  web.3  nginx:latest  myengine0  Running        Running 16 minutes ago
qchj744k0e45  web.4  nginx:latest  myengine1  Running        Running 17 minutes ago
udimh2bokl8k  web.5  nginx:latest  myengine2  Running        Running 17 minutes ago
t50yhhtngbac  web.6  nginx:latest  myengine0  Running        Running 16 minutes ago

Browser

To verify that our replicas are running as they should:

docker-swarm-nginx.png

Scaling a service

It’s really interesting that we can scale out or scale down our replicas on the fly !

from the manager


docker@myengine0:~$  docker service ls
ID            NAME  MODE        REPLICAS  IMAGE
ql6iogo587ib  web   replicated  6/6       nginx:latest

docker@myengine0:~$ docker service ps web
ID            NAME   IMAGE         NODE       DESIRED STATE  CURRENT STATE       ERROR  PORTS
t3v855enecgv  web.1  nginx:latest  myengine1  Running        Running 3 days ago
xgwi91plvq00  web.2  nginx:latest  myengine2  Running        Running 3 days ago
0l6h6a0va2fy  web.3  nginx:latest  myengine0  Running        Running 3 days ago
qchj744k0e45  web.4  nginx:latest  myengine1  Running        Running 3 days ago
udimh2bokl8k  web.5  nginx:latest  myengine2  Running        Running 3 days ago
t50yhhtngbac  web.6  nginx:latest  myengine0  Running        Running 3 days ago

Scale Down

from the manager


$ docker service scale web=3
web scaled to 3

docker@myengine0:~$ docker service ls
ID            NAME  MODE        REPLICAS  IMAGE
ql6iogo587ib  web   replicated  3/3       nginx:latest

docker@myengine0:~$ docker service ps web
ID            NAME   IMAGE         NODE       DESIRED STATE  CURRENT STATE       ERROR  PORTS
0l6h6a0va2fy  web.3  nginx:latest  myengine0  Running        Running 3 days ago
qchj744k0e45  web.4  nginx:latest  myengine1  Running        Running 3 days ago
udimh2bokl8k  web.5  nginx:latest  myengine2  Running        Running 3 days ago

Scale Up

from the manager


docker@myengine0:~$ docker service scale web=8
web scaled to 8
docker@myengine0:~$
docker@myengine0:~$ docker service ls
ID            NAME  MODE        REPLICAS  IMAGE
ql6iogo587ib  web   replicated  3/8       nginx:latest
docker@myengine0:~$
docker@myengine0:~$ docker service ls
ID            NAME  MODE        REPLICAS  IMAGE
ql6iogo587ib  web   replicated  4/8       nginx:latest
docker@myengine0:~$
docker@myengine0:~$ docker service ls
ID            NAME  MODE        REPLICAS  IMAGE
ql6iogo587ib  web   replicated  8/8       nginx:latest
docker@myengine0:~$
docker@myengine0:~$
docker@myengine0:~$ docker service ps web
ID            NAME   IMAGE         NODE       DESIRED STATE  CURRENT STATE           ERROR  PORTS
lyhoyseg8844  web.1  nginx:latest  myengine1  Running        Running 7 seconds ago
w3j9bhcn9f6e  web.2  nginx:latest  myengine2  Running        Running 8 seconds ago
0l6h6a0va2fy  web.3  nginx:latest  myengine0  Running        Running 3 days ago
qchj744k0e45  web.4  nginx:latest  myengine1  Running        Running 3 days ago
udimh2bokl8k  web.5  nginx:latest  myengine2  Running        Running 3 days ago
vr8jhbum8tlg  web.6  nginx:latest  myengine1  Running        Running 7 seconds ago
m4jzati4ddpp  web.7  nginx:latest  myengine2  Running        Running 8 seconds ago
7jek2zvuz6fs  web.8  nginx:latest  myengine0  Running        Running 11 seconds ago

February 25, 2017 09:19 PM

February 23, 2017

Steve Kemp's Blog

Rotating passwords

Like many people I use a password-manage to record logins to websites. I previously used a tool called pwsafe, but these days I switched to using pass.

Although I don't like the fact the meta-data is exposed the tool is very useful, and its integration with git is both simple and reliable.

Reading about the security issue that recently affected cloudflare made me consider rotating some passwords. Using git I figured I could look at the last update-time of my passwords. Indeed that was pretty simple:

git ls-tree -r --name-only HEAD | while read filename; do
  echo "$(git log -1 --format="%ad" -- $filename) $filename"
done

Of course that's not quite enough because we want it sorted, and to do that using the seconds-since-epoch is neater. All together I wrote this:

#!/bin/sh
#
# Show password age - should be useful for rotation - we first of all
# format the timestamp of every *.gpg file, as both unix+relative time,
# then we sort, and finally we output that sorted data - but we skip
# the first field which is the unix-epoch time.
#
( git ls-tree -r --name-only HEAD | grep '\.gpg$' | while read filename; do \
      echo "$(git log -1 --format="%at %ar" -- $filename) $filename" ; done ) \
        | sort | awk '{for (i=2; i<NF; i++) printf $i " "; print $NF}'

Not the cleanest script I've ever hacked together, but the output is nice:

 steve@ssh ~ $ cd ~/Repos/personal/pass/
 steve@ssh ~/Repos/personal/pass $ ./password-age | head -n 5
 1 year, 10 months ago GPG/root@localhost.gpg
 1 year, 10 months ago GPG/steve@steve.org.uk.OLD.gpg
 1 year, 10 months ago GPG/steve@steve.org.uk.NEW.gpg
 1 year, 10 months ago Git/git.steve.org.uk/root.gpg
 1 year, 10 months ago Git/git.steve.org.uk/skx.gpg

Now I need to pick the sites that are more than a year old and rotate credentials. Or delete accounts, as appropriate.

February 23, 2017 10:00 PM

Anton Chuvakin - Security Warrior

Electricmonk.nl

How to solve RPMs created by Alien having file conflicts

I generate release packages for my software with Alien, which amongst other things converts .deb packages to .rpm.

On Fedora 24 however, the generated RPMs cause a small problem when installed with Yum:

Transaction check error:
  file / from install of cfgtrack-1.0-2.noarch conflicts with file from package filesystem-3.2-20.el7.x86_64
  file /usr/bin from install of cfgtrack-1.0-2.noarch conflicts with file from package filesystem-3.2-20.el7.x86_64

There's a bit of info to be found on the internet about this problem, with most of the posts suggesting using rpmrebuild to fix it. Unfortunately, it looks like rpmrebuild actually requires the package to be installed, and since I don't actually use a RPM-based system, that was a bit of a no-go.

So here's how to fix those packages manually:

First, use Alient to generate a RPM package folder from a Debian package, but don't generate the actual package yet. You can do so with the -g switch:

alien -r -g -v myproject-1.0.deb

This generates a myproject-1.0 directory containing the root fs for the package as well as a myproject-1.0-2.spec file. This spec file is the actual problem. It defines directories for paths such as / and /usr/bin. But those are already provided by the filesystem package, so we shouldn't include them.

You can remove them from a script using sed:

sed -i 's#%dir "/"##' myproject-1.0/myproject-1.0-2.spec
sed -i 's#%dir "/usr/bin/"##' myproject-1.0/myproject-1.0-2.spec

This edits the spec file in-place and replaces the following lines with empty lines:

%dir "/"
%dir "/usr/bin/"

The regular expressions look somewhat different than usual, because I'm using the pound (#) sign as a reg marker instead of "/".

Finally, we can recreate the package using rpmbuild:

cd myproject-1.0
rpmbuild --target=noarch --buildroot /full/path/to/myproject-1.0/ \
         -bb cfgtrack-$(REL_VERSION)-2.spec

The resulting package should install without errors or warnings now.

by admin at February 23, 2017 06:27 AM

February 22, 2017

SysAdmin1138

Bad speaker advice

Last year, while I was developing my talks, I saw a bit of bad advice. I didn't recognize it at the time. Instead, I saw it as a goal to reach. The forum was a private one, and I've long forgotten who the players were. But here is a reconstructed, summarized view of what spurred me to try:

elph1120: You know what I love? A speaker who can do an entire talk from one slide.
612kenny: OMG yes. I saw someguy do that at someconference. It was amazeballs.
elph1120: Yeah, more speakers should do that.
gryphon: Totally.

This is bad advice. Don't do this.

Now to explain what happened...

I saw this, and decided to try and do that for my DevOpsDays Minneapolis talk last year. I got close, I needed 4 slides. Which is enough to fit into a tweet.

See? No link to SlideShare needed! Should be amazing!

It wasn't.

The number one critique I got, by a large, large margin was this:

Wean yourself from the speaker-podium.

In order to do a 4-slide talk, I had to lean pretty hard on speaker-notes. If you're leaning on speaker-notes, you're either tied to the podium or have cue-cards in your hands. Both of these are violations of the modern TED-talk style-guide tech-conferences are following these days. I should have noticed that the people rhapsodizing over one-slide talks were habitues of one of the holdouts of podium-driven talks in the industry.

That said, there is another way to do a speaker-note free talk: the 60-slide deck for a 30 minute talk. Your slides are the notes. So long as you can remember some points to talk about above and beyond what's written on the slides, you're providing value above and beyond the deck you built. The meme-slide laugh inducers provide levity and urge positive feedback. If you're new to speaking this is the style you should be aiming for.

A one-slide talk is PhD level speaking-skills. It means memorizing paragraph by paragraph a 3K word essay, and read it back while on stage and on camera. You should not be trying to reach this bar until you're already whatever about public speaking, and have delivered that talk a bunch of times already.

by SysAdmin1138 at February 22, 2017 04:27 PM

February 21, 2017

OpenSSL

OpenSSL and Threads

This post talks about OpenSSL and threads. In particular, using OpenSSL in multi-threaded applications. It traces through the history, explains what was changed for the 1.1.0 release, and will hopefully provide some guidance to developers.

While none of the behaviors have really changed, and therefore none of this should be new information, the documentation has not been as clear as it could, or should, be. Therefore, some readers might be surprised by what’s in this post.

In short, OpenSSL has always, and only, supported the concept of locking an object and sometimes it locks its internal objects. Read on for more details.

In OpenSSL 1.0.2 (and earlier), applications had to provide their own integration with locking and threads, as documented in the threads.pod file. This page starts with the following unfortunate text:

OpenSSL can safely be used in multi-threaded applications provided that at least two callback functions are set, …

The problem is that the word safely was never actually defined. This led many developers to think that OpenSSL’s objects could safely be used by multiple threads at the same time. For example, using a single SSL object in two threads, one each for reading and writing.

The file crypto/th-lock.c in 1.0.2 had a “sample” implementation of the lock callbacks, for a number of platforms. The intent was that applications take that file, make it work on their platform and include it in their application. Many apparently managed to do that, although I wryly point out this comment from the Git history in early 2000:

At least, this compiles nicely on Linux using PTHREADS. I’ve done no other tests so far.

With 1.1.0, the major change was to move this from run-time to compile-time. If threads support is enabled (still the same misleading name), then the native threads package is used. On Windows, this means using the “critical section” API for locks, as can be seen in crypto/threads_win.c. On all other platforms, the pthreads API is used, crypto/threads_pthread.c. In cases where the native threads facility isn’t known, or if explicitly configured with no-threads then dummy functions are used, crypto/threads_none.c.

It is worth looking at at least one of these files. They are all very small, 120 to 170 lines, and can provide an overview of what OpenSSL actually needs and provides. The documentation is a bit better, but could still give the impression that concurrent use of objects in multiple threads is supported. We should fix that, soon.

So what is supported? One way to find out is to do

    git grep CRYPTO_THREAD_.*lock

If you do this, you’ll see that there is some locking around internal lookup tables for engines, error codes, and some RSA and X.509 lookup operations, and also about SSL sessions. Almost none of this behavior is documented, and it’s risky to plan on future behavior from just looking at the current code, but in general you can think that either tables of things (errors, sessions) or low-level internal details (RSA blinding, RSA Montgomeray optimizations) is safe. And most of the time, you probably won’t even think of that at all, anyway.

The other place where threads support comes in, isn’t necessarily about threads. It’s about maintaining reference counts on objects, and ensuring that an object is not free’d until the count hits zero. This is implemented using per-object locks, and the CRYPTO_atomic_add API. You can see some relatively simple examples of the up_ref API in test/crltest.c which also shows one of the common use-cases: adding an object to a container, but still “needing” that object for later, after the container is removed. Most datatypes have their own XXX_up_ref API.

Internally, OpenSSL is moving to do more things using native facilities. For example, the auto-init is done using “run-once” facilities, and the error state is maintained using thread-local storage. I’m not going to provide pointers to the source, because I don’t want to encourage people to rely on undocumented behavior. But, what you can rely on is that if you natively use your system threads facility, OpenSSL will work seamlessly with it. But you still can’t concurrently use most objects in multiple threads.

February 21, 2017 11:00 AM

Colin Percival

Cheating on a string theory exam

And now for something completely different: I've been enjoying FiveThirtyEight's "The Riddler" puzzles. A couple weeks ago I submitted a puzzle of my own; but I haven't heard back and it's too cute a puzzle to not share, so I've decided to post it here.

February 21, 2017 03:50 AM

February 20, 2017

Electricmonk.nl

Reliable message delivery with Mosquitto (MQTT)

I was looking for a message queue that could reliably handle messages in such a way that I was guaranteed never to miss one, even if the consumer is offline or crashes. Mosquitto (MQTT) comes very close to that goal. However, it wasn't directly obvious how to configure it to be as reliable as possible So this post describes how to use Mosquitto to ensure the most reliable delivery it can handle.

TL;DR: You can't

If you want to do reliable message handling with Mosquitto, the short answer is: You can't. For the long answer, read the rest of the article. Or if you're lazy and stubborn, read the "Limitations" section further down. ;-)

Anyway, let's get on with the show and see how close Mosquitto can get.

Quick overview of Mosquitto

Here's a quick schematic of Mosquitto components:

+----------+     +--------+     +----------+
| producer |---->| broker |---->| consumer |
+----------+     +--------+     +----------+

The producer sends messages to a topic on the broker. The broker maintains an internal state of topics and which consumers are interested in which topics. It also maintains a queue of messages which still need to be sent to each consumer. How the broker decided what / when to send to which consumer depends on settings such as the QoS (Quality of Service) and what kind of session the consumer is opening.

Producer and consumer settings

Here's a quick overview of settings that ensure the highest available quality of delivery of messages with Mosquitto. When creating a consumer or producer, ensure you set these settings properly:

  • quality-of-service must be 2.
  • The consumer must send a client_id.
  • clean_session on the consumer must be False.

These are the base requirements to ensure that each consumer will receive messages exactly once, even if they've been offline for a while. The quality-of-service setting of 2 ensures that the broker requires acknowledgement from the consumer that a message has been received properly. Only then does the broker update its internal state to advance the consumer to the next message in the queue. If the client crashes before acknowledging the message, it'll be resent the next time.

The client_id gives the broker a unique name under which to store session state information such as the last message the consumer has properly acknowledged. Without a client_id, the broker cannot do this.

The clean_session setting lets the consumer inform the broker about whether it wants its session state remembered. Without it, the broker assumes the broker assumes the consumer does not care about past messages and such. It will only receive any new messages that are produced after the consumer has connected to the broker.

Together these settings ensure that messages are reliably delivered from the producer to the broker and to the consumer, even if the consumer has been disconnected for a while or crashes while receiving the message.

Broker settings

The following settings are relevant configuration options on the broker. You can generally find these settings in/etc/mosquitto/mosquitto.conf.

  • The broker must have persistence set to True in the broker configuration.
  • You may want to set max_inflight_messages to 1 in the broker configuration to ensure correct ordering of messages.
  • Configure max_queued_messsages to the maximum number of messages to retain in a queue.
  • Tweak autosave_interval to how often you want the broker to write the in-memory database to disk.

The persistence setting informs the broker that you'd like session state and message queues written to disk. If the broker for some reason, the messages will (mostly) still be there.

You can ensure that messages are sent to consumers in the same order as they were sent to the broker by the producers by setting the max_inflight_messages setting to 1. This will probably severely limit the throughput speed of messages.

The max_queued_messsages determines how many unconfirmed messages should maximally be retained in queues. This should basically be the product of the maximum number of messages per second and the maximum time a consumer might be offline. Say we're processing 1 message per second and we want the consumer to be able to be offline for 2 hours (= 7200 seconds), then the max_queued_messsages setting should be 1 * 7200 = 7200.

The autosave_interval determines how often you want the broker to write the in-memory database to disk. I suspect that setting this to a very low level will cause severe Disk I/O activity.

Examples

Here's an example of a producer and consumer:

producer.py:

import paho.mqtt.client as paho
import time

client = paho.Client(protocol=paho.MQTTv31)
client.connect("localhost", 1883)
client.loop_start()
client.publish("mytesttopic", str("foo"), qos=2)
time.sleep(1)  # Give the client loop time to proess the message

consumer.py:

import paho.mqtt.client as paho

def on_message(client, userdata, msg):
    print(msg.topic+" "+str(msg.qos)+" "+str(msg.payload))

client = paho.Client("testclient", clean_session=False, protocol=paho.MQTTv31)
client.on_message = on_message
client.connect("localhost", 1883)
client.subscribe("mytesttopic", qos=2)
client.loop_forever()

Pitfalls

There are a few pitfalls I ran into when using Mosquitto:

  • If the broker or one of the clients doesn't support the MQTTv32 protocol, things will fail silently. So I specify MQTTv31 manually.
  • The client loop needs some time to process the sending and receiving of messages. If you send a single message and exit your program right away, the loop doesn't have time to actually send the message.
  • The subscriber must have already run once before the broker will start keeping messages for it. Otherwise, the broker has no idea that a consumer with QoS=2 is interested in messages (and would have to keep messages for ever). So register your consumer once by just running it, before the producer runs.

Limitations

Although the settings above make exchanging messages with Mosquitto more reliable, there are still some downsides:

  • Exchanging messages in this way is obviously slower than having no consistency checks in place.
  • Since the Mosquitto broker only writes the in-memory database to disk every X (where X is configurable) seconds, you may lose data if the broker crashes.
  • On the consumer side, it is the MQTT library that confirms the receipt of the message. However, as far as I can tell, there is no way to manually confirm the receipt of a message. So if your client crashes while handling a message, rather than while it is receiving a message, you may still lose the message. If you wish to handle this case, you can store the message on the client as soon as possible. This is, however, not much more reliable. The only other way is to implement some manual protocol via the exchange of messages where the original publisher retains a message and resends it unless its been acknowledged by the consumer.

Conclusion

In other words, as far as I can see, you cannot do reliable message handling with Mosquitto. If your broker crashes or your client crashes, Mosquitto will lose your messages. Other than that, if all you require is reliable delivery of messages to the client, you're good to go.

So what are the alternatives? At this point, I have to honest and say: I don't know yet. I'm personally looking for a lightweight solution, and it seems none of the lightweight Message Queues do reliable message handling (as opposed to reliable messagedelivery, which most do just fine).

When I find an answer, I'll let you know here.

by admin at February 20, 2017 07:10 PM