Planet SysAdmin


Upcoming sysadmin conferences/events

Contact me to have your event listed here.


July 24, 2014

The Tech Teapot

Software the old fashioned way

I was clearing out my old bedroom after many years nagging by my parents when I came across my two of my old floppy disk boxes. Contained within is a small snapshot of my personal computing starting from 1990 through until late 1992. Everything before and after those dates doesn’t survive I’m afraid.

XLISP Source code disk CLIPS expert system sources disk Little Smalltalk interpreter disk Micro EMACS disks

The archive contains loads of backups of work I produced, now stored on Github, as well as public domain / shareware software, magazine cover disks and commercial software I purchased. Yes, people used to actually buy software. With real money. A PC game back in the late 1980s cost around £50 in 1980s money. According to this historic inflation calculator, that would be £117 now. Pretty close to a week’s salary for me at the time.

One of my better discoveries from the late 1980s was public domain and shareware software libraries. Back then there were a number of libraries, usually advertised in the small ads at the back of computer magazines.

This is a run down of how you’d use your typical software library:

  1. Find an advert from a suitable library and write them a nice little letter requesting they send you a catalog. Include payment as necessary;
  2. Wait for a week or two;
  3. Receive a small, photocopied catalog with lists of floppies and a brief description of the contents;
  4. Send the order form back to the library with payment, usually by cheque;
  5. Wait for another week or two;
  6. Receive  a small padded envelope through the post with my selection of floppies;
  7. Explore and enjoy!

If you received your order in two weeks you were doing well. After the first order, when you have your catalog to hand, you could get your order in around a week. A week was pretty quick for pretty well anything back then.

The libraries were run as small home businesses. They were the perfect second income. Everything was done by mail, all you had to do was send catalogs when requested and process orders.

One of the really nice things about shareware libraries was that you never really knew what you were going to get. Whilst you’d have an idea of what was on the disk from the description in the catalog, they’d be a lot of programs that were not described. Getting a new delivery was like a mini MS-DOS based text adventure, discovering all of the neat things on the disks.

The libraries contained lots of different things, mostly shareware applications of every kind you can think of. The most interesting to me as an aspiring programmer was the array of public domain software. Public domain software was distributed with the source code. There is no better learning tool when programming than reading other peoples’ code. The best code I’ve ever read was the CLIPS sources for a forward chaining expert system shell written by NASA.

Happy days :)

PS All of the floppies I’ve tried so far still work :) Not bad after 23 years.

The post Software the old fashioned way appeared first on Openxtra Tech Teapot.

by Jack Hughes at July 24, 2014 07:00 AM

Chris Siebenmann

What influences SSH's bulk transfer speeds

A number of years ago I wrote How fast various ssh ciphers are because I was curious about just how fast you could do bulk SSH transfers and how to get them to go fast under various circumstances. Since then I have learned somewhat more about SSH speed and what controls what things you have available and can get.

To start with, my years ago entry was naively incomplete because SSH encryption has two components: it has both a cipher and a cryptographic hash used as the MAC. The choice of both of them can matter, especially if you're willing to deliberately weaken the MAC. As an example of how much of an impact this might make, in my testing on a Linux machine I could almost double SSH bandwidth by switching from the default MAC to 'umac-64-etm@openssh.com'.

(At the same time, no other MAC choice made much of a difference within a particular cipher, although hmac-sha1 was sometimes a bit faster than hmac-md5.)

Clients set the cipher list with -c and the MAC with -m, or with the Ciphers and MACs options in your SSH configuration file (either a personal one or a global one). However, what the client wants to use has to be both supported by the server and accepted by it; this is set in the server's Ciphers and MACs configuration options. The manpages for ssh_config and sshd_config on your system will hopefully document both what your system supports at all and what it's set to accept by default. Note that this is not necessarily the same thing; I've seen systems where sshd knows about ciphers that it will not accept by default.

(Some modern versions of OpenSSH also report this information through 'ssh -Q <option>'; see the ssh manpage for details. Note that such lists are not necessarily reported in preference order.)

At least some SSH clients will tell you what the server's list of acceptable ciphers (and MACs) if you tell the client to use options that the server doesn't support. If you wanted to, I suspect that you could write a program in some language with SSH protocol libraries that dumped all of this information for you for an arbitrary server (without the fuss of having to find a cipher and MAC that your client knew about but your server didn't accept).

Running 'ssh -v' will report the negotiated cipher and MAC that are being used for the connection. Technically there are two sets of them, one for the client to server and one for the server back to the client, but I believe that under all but really exceptional situations you'll use the same cipher and MAC in both directions.

Different Unix OSes may differ significantly in their support for both ciphers and MACs. In particular Solaris effectively forked a relatively old version of OpenSSH and so modern versions of Illumos (and Illumos distributions such as OmniOS) do not offer you anywhere near a modern list of choices here. How recent your distribution is will also matter; our Ubuntu 14.04 machines naturally offer us a lot more choice than our Ubuntu 10.04 ones.

PS: helpfully the latest OpenSSH manpages are online (cf), so the current manpage for ssh_config will tell you the latest set of ciphers and MACs supported by the official OpenSSH and also show the current preference order. To my interest it appears that OpenSSH now defaults to the very fast umac-64-etm MAC.

by cks at July 24, 2014 03:22 AM

July 23, 2014

The Tech Teapot

Early 1990s Software Development Tools for Microsoft Windows

The early 1990s were an interesting time for software developers. Many of the tools that are taken for granted today made their debut for a mass market audience.

I don’t mean that the tools were not available previously. Both Smalltalk  and LISP sported what would today be considered modern development environments all the way back in the 1970s, but hardware requirements put the tools well beyond the means of regular joe programmers. Not too many people had workstations at home or in the office for that matter.

I spent the early 1990s giving most of my money to software development tool companies of one flavour or another.

Actor v4.0 Floppy Disks Whitewater Object Graphics Floppy Disk MKS LEX / YACC Floppy Disks Turbo Pascal for Windows Floppy Disks

Actor was a combination of object oriented language and programming environment for very early versions of Microsoft Windows. There is a review in Info World magazine of Actor version 3 that makes interesting reading. It was somewhat similar to Smalltalk, but rather more practical for building distributable programs. Unlike Smalltalk, it was not cross platform but on the plus side, programs did look like native Windows programs. It was very much ahead of its time in terms of both the language and the programming environment and ran on pretty modest hardware.

I gave Borland quite a lot of money too. I bought Turbo Pascal for Windows when it was released, having bought regular old Turbo Pascal v6 for DOS a year or so earlier. The floppy disks don’t have a version number on so I have no idea which version it is.

I bought Microsoft C version 6 introducing as it did a DOS based IDE, it was still very much an old school C compiler. If you wanted to create Windows software you needed to buy the Microsoft Windows SDK at considerable extra cost.

Asymetrix Toolbook was marketed in the early 1990s as a generic Microsoft Windows development tool. There are old Info World reviews here and here. Asymetrix later moved the product to be a learning authorship tool. I rather liked the tool, though it didn’t really have the performance and flexibility I was looking for. Distributing your finished work was also not a strong point.

Microsoft Quick C for Windows version 1.0 was released in late 1991. Quick C bundled a C compiler with the Windows SDK so that you could build 16 bit Windows software. It also sported an integrated C text editor, resource editor  and debugger.

The first version of Visual Basic was released in 1991. I am not sure why I didn’t buy it, I imagine there was some programming language snobbery on my part. I know there are plenty of programmers of a certain age who go all glassy eyed at the mere thought of BASIC, but I’m not one of them. Visual Basic also had an integrated editor and debugger.

Both Quick C and Visual Basic are the immediate predecesors of the Visual Studio product of today.

The post Early 1990s Software Development Tools for Microsoft Windows appeared first on Openxtra Tech Teapot.

by Jack Hughes at July 23, 2014 10:59 AM

Chris Siebenmann

One of SELinux's important limits

People occasionally push SELinux as the cure for security problems and look down on people who routinely disable it (as we do). I have some previously expressed views on this general attitude, but what I feel like pointing out today is that SELinux's security has some important intrinsic limits. One big one is that SELinux only acts at process boundaries.

By its nature, SELinux exists to stop a process (or a collection of them) from doing 'bad things' to the rest of the system and to the outside environment. But there are any number of dangerous exploits that do not cross a process's boundaries this way; the most infamous recent one is Heartbleed. SELinux can do nothing to stop these exploits because they happen entirely inside the process, in spheres fully outside its domain. SELinux can only act if the exploit seeks to exfiltrate data (or influence the outside world) through some new channel that the process does not normally use, and in many cases the exploit doesn't need to do that (and often doesn't bother).

Or in short, SELinux cannot stop your web server or your web browser from getting compromised, only from doing new stuff afterwards. Sending all of the secrets that your browser or server already has access to to someone in the outside world? There's nothing SELinux can do about that (assuming that the attacker is competent). This is a large and damaging territory that SELinux doesn't help with.

(Yes, yes, privilege separation. There are a number of ways in which this is the mathematical security answer instead of the real one, including that most network related programs today are not privilege separated. Chrome exploits also have demonstrated that privilege separation is very hard to make leak-proof.)

by cks at July 23, 2014 04:25 AM

RISKS Digest

July 22, 2014

Everything Sysadmin

Tom will be the October speaker at Philly Linux Users Group (central)

I've just been booked to speak at PLUG/Central in October. I'll be speaking about our newest book, The Practice of Cloud System Administration.

For a list of all upcoming speaking engagements, visit our appearances page: http://everythingsysadmin.com/appearances/

July 22, 2014 07:40 PM

Standalone Sysadmin

Goverlan press release on the Boston SysAdmin Day party!

Thanks again to the folks over at Goverlan for supporting my SysAdmin Appreciation Day party here in Boston! They're so excited, they issued a Press Release on it!

Goverlan Press Release

Click to read more

I'm really looking forward to seeing everyone on Friday. We've got around 60 people registered, and I imagine we'll have more soon. Come have free food and drinks. Register now!

by Matt Simmons at July 22, 2014 02:12 PM

Racker Hacker

Etsy reminds us that information security is an active process

I’m always impressed with the content published by folks at Etsy and Ben Hughes’ presentation from DevOpsDays Minneapolis 2014 is no exception.

Ben adds some levity to the topic of information security with some hilarious (but relevant) images and reminds us that security is an active process that everyone must practice. Everyone plays a part — not just traditional corporate security employees.

I’ve embedded the presentation here for your convenience:

Here’s a link to the original presentation on SpeakerDeck:

Etsy reminds us that information security is an active process is a post from: Major Hayden's blog.

Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.

by Major Hayden at July 22, 2014 01:06 PM

Chris Siebenmann

What I know about the different types of SSH keys (and some opinions)

Modern versions of SSH support up to four different types of SSH keys (both for host keys to identify servers and for personal keys): RSA, DSA, ECDSA, and as of OpenSSH 6.5 we have ED25519 keys as well. Both ECDSA and ED25519 uses elliptic curve cryptography, DSA uses finite fields, and RSA is based on integer factorization. EC cryptography is said to have a number of advantages, particularly in that it uses smaller key sizes (and thus needs smaller exchanges on the wire to pass public keys back and forth).

(One discussion of this is this cloudflare blog post.)

RSA and DSA keys are supported by all SSH implementations (well, all SSH v2 implementations which is in practice 'all implementations' these days). ECDSA keys are supported primarily by reasonably recent versions of OpenSSH (from OpenSSH 5.7 onwards); they may not be in other versions, such as the SSH that you find on Solaris and OmniOS or on a Red Hat Enterprise 5 machine. ED25519 is only supported in OpenSSH 6.5 and later, which right now is very recent; of our main machines, only the Ubuntu 14.04 ones have it (especially note that it's not supported by the RHEL 7/CentOS 7 version of OpenSSH).

(I think ED25519 is also supported on Debian test (but not stable) and on up to date current FreeBSD and OpenBSD versions.)

SSH servers can offer multiple host keys in different key types (this is controlled by what HostKey files you have configured). The order that OpenSSH clients will try host keys in is controlled by two things: the setting of HostKeyAlgorithms (see 'man ssh_config' for the default) and what host keys are already known for the target host. If no host keys are known, I believe that the current order is order is ECDSA, ED25519, RSA, and then DSA; once there are known keys, they're tried first. What this really means is that for an otherwise unknown host you will be prompted to save the first of these key types that the host has and thereafter the host will be verified against it. If you already know an 'inferior' key (eg a RSA key when the host also advertises an ECDSA key), you will verify the host against the key you know and, as far as I can tell, not even save its 'better' key in .ssh/known_hosts.

(If you have a mixture of SSH client versions, people can wind up with a real mixture of your server key types in their known_hosts files or equivalent. This may mean that you need to preserve and restore multiple types of SSH host keys over server reinstalls, and probably add saving and restoring ED25519 keys when you start adding Ubuntu 14.04 servers to your mix.)

In terms of which key type is 'better', some people distrust ECDSA because the elliptic curve parameters are magic numbers from NIST and so could have secret backdoors, as appears to be all but certain for another NIST elliptic curve based cryptography standard (see also and also and more). I reflexively dislike both DSA and ECDSA because DSA implementation mistakes can be explosively fatal, as in 'trivially disclose your private keys'. While ED25519 also uses DSA it takes specific steps to avoid at least some of the explosive failures of plain DSA and ECDSA, failures that have led to eg the compromise of Sony's Playstation 3 signing keys.

(RFC 6979 discusses how to avoid this particular problem for DSA and ECDSA but it's not clear to me if OpenSSH implements it. I would assume not until explicitly stated otherwise.)

As a result of all of this I believe that the conservative choice is to advertise and use only RSA keys (both host keys and personal keys) with good bit sizes. The slightly daring choice is to use ED25519 when you have it available. I would not use either ECDSA or DSA although I wouldn't go out of my way to disable server ECDSA or DSA host keys except in a very high security environment.

(I call ED25519 'slightly daring' only because I don't believe it's undergone as much outside scrutiny as RSA, and I could be wrong about that. See here and here for a discussion of ECC and ED25519 security properties and general security issues. ED25519 is part of Dan Bernstein's work and in general he has a pretty good reputation on these issues. Certainly the OpenSSH people were willing to adopt it.)

PS: If you want to have your eyebrows raised about the choice of elliptic curve parameters, see here.

PPS: I don't know what types of keys non-Unix SSH clients support over and above basic RSA and DSA support. Some casual Internet searches suggest that PuTTY doesn't support ECDSA yet, for example. And even some Unix software may have difficulties; for example, currently GNOME Keyring apparently doesn't support ECDSA keys (via archlinux).

by cks at July 22, 2014 03:13 AM

July 21, 2014

Racker Hacker

icanhazip and CORS

I received an email from an icanhazip.com user last week about enabling cross-origin resource sharing. He wanted to use AJAX calls on a different site to pull data from icanhazip.com and use it for his visitors.

Those headers are now available for all requests to the services provided by icanhazip.com! Here’s what you’ll see:

$ curl -i icanhazip.com
---
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET
---

icanhazip and CORS is a post from: Major Hayden's blog.

Thanks for following the blog via the RSS feed. Please don't copy my posts or quote portions of them without attribution.

by Major Hayden at July 21, 2014 02:31 PM

Steve Kemp's Blog

An alternative to devilspie/devilspie2

Recently I was updating my dotfiles, because I wanted to ensure that media-players were "always on top", when launched, as this suits the way I work.

For many years I've used devilspie to script the placement of new windows, and once I googled a recipe I managed to achieve my aim.

However during the course of my googling I discovered that devilspie is unmaintained, and has been replaced by something using Lua - something I like.

I'm surprised I hadn't realized that the project was dead, although I've always hated the configuration syntax it is something that I've used on a constant basis since I found it.

Unfortunately the replacement, despite using Lua, and despite being functional just didn't seem to gell with me. So I figured "How hard could it be?".

In the past I've written softare which iterated over all (visible) windows, and obviously I'm no stranger to writing Lua bindings.

However I did run into a snag. My initial implementation did two things:

  • Find all windows.
  • For each window invoke a lua script-file.

This worked. This worked well. This worked too well.

The problem I ran into was that if I wrote something like "Move window 'emacs' to desktop 2" that action would be applied, over and over again. So if I launched emacs, and then manually moved the window to desktop3 it would jump back!

In short I needed to add a "stop()" function, which would cause further actions against a given window to cease. (By keeping a linked list of windows-to-ignore, and avoiding processing them.)

The code did work, but it felt wrong to have an ever-growing linked-list of processed windows. So I figured I'd look at the alternative - the original devilspie used libwnck to operate. That library allows you to nominate a callback to be executed every time a new window is created.

If you apply your magic only on a window-create event - well you don't need to bother caching prior-windows.

So in conclusion :

I think my code is better than devilspie2 because it is smaller, simpler, and does things more neatly - for example instead of a function to get geometry and another to set it, I use one. (e.g. "xy()" returns the position of a window, but xy(3,3) sets it.).

kpie also allows you to run as a one-off job, and using the simple primitives I wrote a file to dump your windows, and their size/placement, which looks like this:

shelob ~/git/kpie $ ./kpie --single ./samples/dump.lua
-- Screen width : 1920
-- Screen height: 1080
..
if ( ( window_title() == "Buddy List" ) and
     ( window_class() == "Pidgin" ) and
     ( window_application() == "Pidgin" ) ) then
     xy(1536,24 )
     size(384,1032 )
     workspace(2)
end
if ( ( window_title() == "feeds" ) and
     ( window_class() == "Pidgin" ) and
     ( window_application() == "Pidgin" ) ) then
     xy(1,24 )
     size(1536,1032 )
     workspace(2)
end
..

As you can see that has dumped all my windows, along with their current state. This allows a simple starting-point - Configure your windows the way you want them, then dump them to a script file. Re-run that script file and your windows will be set back the way they were! (Obviously there might be tweaks required.)

I used that starting-point to define a simple recipe for configuring pidgin, which is more flexible than what I ever had with pidgin, and suits my tastes.

Bug-reports welcome.

July 21, 2014 02:30 PM

Chris Siebenmann

The CBL has a real false positive problem

As I write this, a number of IP addresses in 128.100.1.0/24 are listed in the CBL, and various of them have been listed for some time. There is a problem with this: these CBL-listed IP addresses don't exist. I don't mean 'they aren't supposed to exist'; I mean 'they could only theoretically exist on a secure subnet in our machine room and even if they did exist our firewall wouldn't allow them to pass traffic'. So these IP addresses don't exist in a very strong sense. Yet the CBL lists them and has for some time.

The first false positive problem the CBL has is that they are listing this traffic at all. We have corresponded with the CBL about this and these listings (along with listings on other of our subnets) all come from traffic observed at a single one of their monitoring points. Unlike what I assumed in the past, these observations are not coming from parsing Received: headers but from real TCP traffic. However they are not connections from our network, and the university is the legitimate owner and router of 128.100/16. A CBL observation point that is using false routing (and is clearly using false routing over a significant period of time) is an actively dangerous thing; as we can see here, false routing can cause the CBL to list anything.

The second false positive problem the CBL has is that, as mentioned, we have corresponded with the CBL over this. In that correspondence the CBL spokesperson agreed that the CBL was incorrect in this listing and would get it fixed. That was a couple of months ago, yet a revolving cast of 128.100.1.0/24 IP addresses still gets listed and relisted in the CBL. As a corollary of this, we can be confident that the CBL listening point(s) involved are still using false routes for some of their traffic. You can apply charitable or less charitable assumptions for this lack of actual action on the CBL's part; at a minimum it is clear that some acknowledged false positive problems go unfixed for whatever reason.

I don't particularly have a better option than the CBL these days. But I no longer trust it anywhere near as much as I used to and I don't particularly like its conduct here.

(And I feel like saying something about it so that other people can know and make their own decisions. And yes, the situation irritates me.)

(As mentioned, we've seen similar issues in the past, cf my original 2012 entry on the issue. This time around we've seen it on significantly more IP addresses, we have extremely strong confidence that it is a false positive problem, and most of all we've corresponded with the CBL people about it.)

by cks at July 21, 2014 03:04 AM

July 20, 2014

Ben's Practical Admin Blog

SysAdmin Day is Nearly Here! July 25th

July 25th marks the 15th annual System Administrator Appreciation Day. SysAdmin Day as it is known is an international event that is held on the last Friday of each July and is a day where businesses and colleagues can express their appreciation for the work IT professionals do, often without any recognition. There are many […]

by Ben at July 20, 2014 07:41 AM

Chris Siebenmann

HTTPS should remain genuinely optional on the web

I recently ran across Mozilla Bug 1041087 (via HN), which has the sort of harmless sound title of 'Switch generic icon to negative feedback for non-https sites'. Let me translate this to English: 'try to scare users if they're connecting to a non-https site'. For anyone who finds this attractive, let me say it flat out; this is a stupid idea on today's web.

(For the record, I don't think it's very likely that Mozilla will take this wishlist request seriously. I just think that there are people out there who wish that they would.)

I used to be very down on SSL Certificate Authorities, basically considering the whole thing a racket. It remains a racket but in today's environment of pervasive eavesdropping it is now a useful one; one might as well make the work of those eavesdroppers somewhat harder. I would be very enthusiastic for pervasive encryption if we could deploy that across the web.

Unfortunately we can't, exactly because of the SSL CA racket. Today having a SSL certificate means either scaring users and doing things that are terrible for security overall or being beholden to a SSL CA (and often although not always forking over money for this dubious privilege). Never mind the lack of true security due to the core SSL problem, this is not an attractive solution in general. Forcing mandatory HTTPS today means giving far too much power and influence to SSL CAs, often including the ability to turn off your website at their whim or mistake.

You might say that this proposal doesn't force mandatory HTTPS. That's disingenuous. Scaring users of a major browser when they visit a non-HTTPS site is effectively forcing HTTPS for the same reason that scary warnings about self-signed certificates force the use of official CA certificates. Very few websites can afford to scare users.

The time to force people towards HTTPS is when we've solved all of these problems. In other words, when absolutely any website can make itself a certificate and then securely advertise and use that certificate. We are nowhere near this ideal world in today's SSL CA environment (and we may or may not ever get there).

(By the way, I mean really mean any website here, including a hypothetical one run by anonymous people and hosted in some place either that no one likes or that generates a lot of fraud or both. There are a lot of proposals that basically work primarily for people in the West who are willing to be officially identified and can provide money; depart from this and you can find availability going downhill rapidly. Read up on the problems of genuine Nigerian entrepreneurs someday.)

by cks at July 20, 2014 04:12 AM

Everything Sysadmin

Tom @ Austin Cloud Meetup

Do you live near Austin? I'll be speaking at the Austin Cloud Meetup in September. They're moving their meeting to coincide with my trip there to speak at the SpiceWorld conference. More info here: http://www.meetup.com/CloudAustin/events/195140322/

July 20, 2014 01:35 AM

July 19, 2014

RISKS Digest

Steve Kemp's Blog

Did you know xine will download and execute scripts?

Today I was poking around the source of Xine, the well-known media player. During the course of this poking I spotted that Xine has skin support - something I've been blissfully ignorant of for many years.

How do these skins work? You bring up the skin-browser, by default this is achieved by pressing "Ctrl-d". The browser will show you previews of the skins available, and allow you to install them.

How does Xine know what skins are available? It downloads the contents of:

NOTE: This is an insecure URL.

The downloaded file is a simple XML thing, containing references to both preview-images and download locations.

For example the theme "Sunset" has the following details:

  • Download link: http://xine.sourceforge.net/skins/Sunset.tar.gz
  • Preview link: http://xine.sourceforge.net/skins/Sunset.png

if you choose to install the skin the Sunset.tar.gz file is downloaded, via HTTP, extracted, and the shell-script doinst.sh is executed, if present.

So if you control DNS on your LAN you can execute arbitrary commands if you persuade a victim to download your "corporate xine theme".

Probably a low-risk attack, but still a surprise.

July 19, 2014 08:48 PM

Chris Siebenmann

Some consequences of widespread use of OCSP for HTTPS

OCSP is an attempt to solve some of the problems of certificate revocation. The simple version of how it works is that when your browser contacts a HTTPS website, it asks the issuer of the site's certificate if the certificate is both known and still valid. One important advantage of OCSP over CRLs that the CA now has an avenue to 'revoke' certificates that it doesn't know about. If the CA doesn't have a certificate in its database, it can assert 'unknown certificate' in reply to your query and the certificate doesn't work.

The straightforward implication of OCSP is that the CA knows that you're trying to talk to a particular website at a particular time. Often third parties can also know this, because OCSP queries may well be done over HTTP instead of HTTPS. OCSP stapling attempts to work around the privacy implications by having the website include a pre-signed, limited duration current attestation about their certificate from the CA, but it may not be widely supported.

(Website operators have to have software that supports OCSP stapling and specifically configure it. OCSP checking in general simply needs a field set in the certificate, which the CA generally forces on your SSL certificates if it supports OCSP.)

The less obvious implication of OCSP is that your CA can now turn off your HTTPS website any time it either wants to, is legally required to, or simply screws up something in its OCSP database. If your browser checks OCSP status and the OCSP server says 'I do not know this certificate', your browser is going to hard-fail the HTTPS connection. In fact it really has to, because this is exactly the response that it would get if the CA had been subverted into issuing an imposter certificate in some way that was off the books.

You may be saying 'a CA would never do this'. I regret to inform my readers that I've already seen this happen. The blunt fact is that keeping high volume services running is not trivial and systems suffer database glitches all the time. It's just that with OCSP someone else's service glitch can take down your website, my website, or in fact a whole lot of websites all at once.

As they say, this is not really a good way to run a railroad.

(See also Adam Langley on why revocation checking doesn't really help with security. This means that OCSP is both dangerous and significantly useless. Oh, and of course it often adds extra latency to your HTTPS connections since it needs to do extra requests to check the OCSP status.)

PS: note that OCSP stapling doesn't protect you from your CA here. It can protect you from temporary short-term glitches that fix themselves automatically (because you can just hold on to a valid OCSP response while the glitch fixes itself), but that's it. If the CA refuses to validate your certificate for long enough (either deliberately or through a long-term problem), your cached OCSP response expires and you're up the creek.

by cks at July 19, 2014 04:23 AM

July 18, 2014

SysAdmin1138

Monitoring alert-streams are like twitter...

For some, every tweet is sacred. Each and every one is to be read, and caught up on if they've been away.

For some it's a stream of interesting, to be looked in upon when the whim strikes. There may be a list of really interesting people that gets checked more often.

For some it's ARG ARG NOISY GO AWAY ARG. This is why we have email.


For some sysadmin teams, every alert is sacred. To be acted upon the moment it arrives, and looked at when you've been away to be sure everything is handled.

For some sysadmin teams, the alerts are glanced at for suspicious patterns but otherwise ignored. Some really interesting ones may be forwarded by email-rules to phones.

For some sysadmin teams, the alerts are completely ignored. Problems are handled in email when other people notice them.


If you ever wondered how someone with a thousand twitter-follows can keep up, it's simple: they don't. It's lossy, it has to be. With that many follows you're dealing with a tweet every 30 seconds and you only keep up when you're bored. Or you make Lists of people you actually care about following, and only browse the master list when there isn't anything else going on.

The same dynamic holds true for a system with 600 individual servers with a variety of applications installed on them. Even if you only turn on OS basics like CPU/RAM/Swap/Disk, your alert stream is going to be very noisy. And like twitter there is going to be a lot of the server equivalent of, "I just had dinner, that... was a lot of food" tweets.

HouWeb01.png

   HouWeb02.png

HouWeb06.png

So riviting. *thud*

When it comes to alerts you need to consider when you want to know something. Putting everything in email is a great way to ignore it, and maybe expose it to some rudimentary email-rule based noise-filters. But wouldn't it be better if that email stream were high quality? And you had an actual website or something you could query for the historical stuff? Yeah, that'd be great.

HouCluster.png

Wait, what? Crap. Why?

Here is a nasty truth: I don't give a damn about CPU/RAM/Swap/Disk. Not in an ACT NOW, MONKEY! kind of way, anyway. I care about that stuff for trending and for historical troubleshooting. Think about the things we want to know and when we want to know them. Once you have an idea about that, you can start defining alerts that will look more like a well curated twitter-feed you don't want to miss, and less like the Alerts folder with 1269 unread messages in it.

So how can you make your alert stream more like a well curated feed, and not the firehose of noise it likely is?

Rule 1: Not all monitorables need a defined alert.

Just because you can track it doesn't mean you need to figure out an alert threshold for it and figure out what text to put into the email/sms. Definitely keep track of it if you have an operational need, but don't try to bother humans with it unless you have a definite need. "I might want to know once," is not definite, it's paranoia. Some systems, especially single points of failure, really do need that kind of alerting so set it up. Yes, please tell me that the 6-figure router I have one of is having a high CPU event, I want to know that. But please don't tell me about high NIC usage on the main database when the backups are staging to the archive system.

While there are people who really will skim through 800 tweets over breakfast in order to catch up with what happened overnight, and there are sysadmins who will read all 800 messages in the Alerts folder over breakfast, the rest of us look at that and go "AAAG information overload!" And skimming? That's great for things you need to know about eventually, but absolute crap when you need to react RIGHT BLOODY NOW.

Rule 2: Not everyone treats alerts the same way you do. Account for that.

You may look at every alert as it arrives and determine if it needs action, but your peers may be more of the "only tell me if something is actually wrong" variety, and have a different definition of "actually wrong". Alert systems are supported by people, so how those people work needs to be dealt with. Come to a consensus, and keep maintaining it. If you're an alerts-over-breakfast type, your cube-mate may be a page-me-if-anything-breaks type. The two of you need to figure out common ground.

Rule 3: Spend the effort to build the kind of alerts you actually need.

The out-of-the-box alerts are almost always.... boring. I rarely, if ever, want to know about high CPU/RAM/Swap/Disk events the moment they happen. Some, like disk-space, I should be picking up on trending reports. Yes, we do have spikes we need to deal with (dumping core on a 768GB RAM box? Tell me), and root/c: drive filling, but that should be a targeted alert not one that goes on everything.

Another boring alert? Pingable. It backstops the actual thing I'm worried about, whether or not TCP/8443 is serving SSL and returns an HTTP/200 status code, but by itself isn't something I want to get an alert on. The big difference is if my monitoring system is smart enough to figure out the "IF !pingable THEN AlertSuppressDependentServices" logic. In that case, I really want pingable because I can trust that I know I have a whole fist of services down based on a single alert, and I'm not getting a storm of messages about the down services on the box.

Spend the time to build the alerts you actually want. If you're doing clustered services or horizontal scaling services you generally don't care much about single system outages, you care about whether or not the service is up and performing acceptably. These are harder alerts to build, but they're what you actually want in most cases.

Rule 4: Create different classes of alert based on urgency.

Some alerts are wake up a human right now alerts, and some are more "bother whoever is on duty right now" alerts. And some are "so long as someone gets to it within a few hours we're OK." This will probably require setting up some kind of on-call schedule with a push notice other than email, such as SMS or a mobile app. Email is good for a lot, but it only is as good as the built in filtering system's ability to isolate signal in a bunch of noise and forward to email-to-SMS gateways.

Sending everything to email and trusting each recipient has good enough mail filters to do the correct urgency handling doesn't scale. And isn't consistent.

Rule 4a: Format your alert text with 160 character-limits in mind.

The chances of your alerts ending up on mobile, filtered through SMS, is pretty high. Best to plan for that from the start.

Rule 5: Ask the right questions during post-mortems.

Post-mortem processes are there for a reason, and that's to do better next time. This is a perfect opportunity to identify improvements in your monitoring and alert systems! Some questions you should be asking:

  • Did the monitoring system pick up the problem?
  • Did we have an alert configured to notice the problem?
  • Did we react to the monitoring system, a customer report, or a sysadmin with a bad feeling about this?
  • What changes can we make to allow the alert system to notify us of this kind of problem?
  • What changes can we make to allow us to notice this event building up so we can deal with it before it becomes a problem?

I have seen many, many cases where the monitoring system DID pick up the problem and DID alert us to it, but no one noticed because it was one alert in a pile of 50 that arrived in a given hour. We reacted when a customer asked us about why their stuff was down. What were the other 49 messages? A few were side-effects of the problem and the rest were routine CPU/RAM/Swap/Disk high-usage notices that we all stopped paying attention to.

Oops.

For a very high profile event that can have some serious consequences for a sysadmin team. You don't want those consequences.

Rule 6: Create some kind of trend-tracking system.

Trends allow you to get ahead of problems. It may be a weekly report with graphs of historical usage that gets mailed out for humans to look at and go, "Hm, that looks kinda bad," over. It may be an actual analytics systems that puts in trend-lines and helpful, "vSphere cluster PROD will be 100% RAM in 39.4 days" text. Or it may be a weekly recurring helpdesk ticket to have someone look at charts for an hour.

Whatever it is, you need something or someone to keep track of trends. Put all those monitorables you're not alarming on to good use and get ahead of the ball for a change. Figure out 3 months before you run out of disk-space that you're going to need to add more. Notice that the latest hotfix has increased RAM usage across the web cluster by 17%. These are not know-right-now things, they're the kind of thing that is alarming, but on a scale of weeks not minutes. You need that kind of alerting too!


It's time to cull your twitter alert stream.

by SysAdmin1138 at July 18, 2014 04:33 PM

Standalone Sysadmin

Slackware is 21? Wow I'm old!

I saw that Slackware Linux is now officially old enough to drink (in the US, anyway). That's pretty amazing!

Patrick Volkerding sent out the initial announcement that Slackware was being worked in way back in 1993.

I didn't start using it then. Heck, I didn't even have a computer then! My first machine was an IBM Aptiva 486 DX2/66. Basically, this:

A friend gave me a copy of the 1996 version of the InfoMagic Linux Developer's Resource, and Slackware (3) was the only distribution I could get working on my machine. At around the same time, another friend's dad found me a copy of Sam's Teach Yourself UNIX in 24 Hours. And that was the genesis of my learning Linux.

I ran Slackware continually, on my desktops and servers, until around 2006, when I needed to do more enterprise-y things at work, and switched to RedHat based systems because they could auth against Active Directory, but I still have a lot of fondness in my heart for Slack.

A while back, I wrote about how instrumental Slackware was to my early knowledge, and from the Hacker News thread, I can see I'm not alone.

If you've never run it before and you'd like a taste, it's easy to get Slack and install it in a VM. You should do this, especially if you've never installed a Linux in anything other than an X-Windowed environment. Today, if you install a desktop Linux, the entire process is graphical, but Slack was never like that. The installation itself used curses, but once you rebooted into your fully functioning machine, you were at a text prompt, and you were expected to configure everything you needed from scratch. Ah, the good old days ;-)

by Matt Simmons at July 18, 2014 11:52 AM

Chris Siebenmann

In practice, 10G-T today can be finicky

Not all that long ago I wrote an entry about why I think that 10G-T will be the dominant form of 10G Ethernet. While I still believe in the fundamental premise of that entry, since then I've also learned that 10G-T today can be kind of finicky in practice (regardless of what the theory says) and this can potentially make 10G-T deployments harder to do and to get reliable than SFP-based ones.

So far we've had two noteworthy incidents. In the most severe one a new firewall refused to recognize link on either 10G-T interface when they were plugged into existing 1G switches. We have no idea why and haven't been able to reproduce the problem; as far as we can tell everything should work. But it didn't. Our on the spot remediation was to switch out the 10G-T card for a dual-1G card and continue on.

(Our tests afterwards included putting the actual card that had the problem into another server of the exact same model and connecting it up to test switches of the exact same model; everything worked.)

A less severe recent issue was finding that one 10G-T cable either had never worked or had stopped working (it was on a pre-wired but uninstalled machine, so we can't be sure). This was an unexceptional short cable from a reputable supplier and apparently it still works if you seat both ends really firmly (which makes it unsuitable for machine room use, where cables may well get tugged out of that sort of thing). At one level I'm not hugely surprised by this; the reddit discussion of my previous entry had a bunch of people who commented that 10G-T could be quite sensitive to cabling issues. But it's still disconcerting to have it actually happen to us (and not with a long cable either).

To be clear, I don't regret our decision to go with 10G-T. Almost all of our 10G-T stuff is working and I don't think we could have afforded to do 10G at all if we'd had to use SFP+ modules. These teething problems are mild by comparison and I have no reason to think that they won't get better over time.

(But if you gave me buckets of money, well, I think that an all SFP+ solution is going to be more reliable today if you can afford it. And it clearly dissipates less power at the moment.)

by cks at July 18, 2014 05:28 AM

July 17, 2014

Chris Siebenmann

My (somewhat silly) SSD dilemma

The world has reached the point where I want to move my home machine from using spinning rust to using SSDs; in fact it's starting to reach the point where sticking on spinning rust seems dowdy and decidedly behind the times. I certainly would like extremely fast IO and no seek overheads and so on, especially when I do crazy things like rebuild Firefox from source on a regular basis. Unfortunately I have a dilemma because of a combination of three things:

  • I insist on mirrored disks for anything I value, for good reason.

  • I want the OS on different physical disks than my data because that makes it much easier to do drastic things like full OS reinstalls (my current system is set up this way).

  • SSDs are not big enough for me to fit all of my data on one SSD (and a bunch of my data is stuff that doesn't need SSD speeds but does need to stay online for good long-term reasons).

(As a hobbyist photographer who shoots in RAW format, the last is probably going to be the case for a long time to come. Photography is capable of eating disk space like popcorn, and it gets worse if you're tempted to get into video too.)

Even if I was willing to accept a non-mirrored system disk (which I'm reluctant to do), satisfying all of this in one machine requires five drives (three SSDs plus two HDs). Six drives would be better. That's a lot of drives to put in one one case and to connect to one motherboard (especially given that an optical drive will require a SATA port these days and yes, I probably still want one).

(I think that relatively small SSDs are now cheap enough that I'd put the OS on SSDs for both speed and lower power. This is contrary to my view two years ago, but times change.)

There are various ways to make all of this fit, such as pushing the optical drive off to an external USB drive and giving up on separate system disk(s), but a good part of my dilemma is that I don't really like any of them. In part it feels like I'm trying to force a system design that is not actually ready yet and what I should be doing is, say, waiting for SSD capacities to go up another factor of two and the prices to drop a bunch more.

(I also suspect that we're going to see more and more mid-tower cases that are primarily designed for 2.5" SSDs, although casual checking suggests that one can get cases that will take a bunch of them even as it stands.)

In short: however tempting SSDs seem, right now it seems like we're in the middle of an incomplete technology transition. However much I'd vaguely like some, I'm probably better off waiting for another year or two or three. How fortunate that this matches my general lethargy about hardware purchases (although there's what happened with my last computer upgrade to make me wonder).

(My impression is that we're actually in the middle of several PC technology transitions. 4K monitors and 'retina' displays seem like another current one, for example, one that I'm quite looking forward to.)

by cks at July 17, 2014 03:52 AM

Everything Sysadmin

IPv6 is real... and mentioned on The Cobert Report

Did you catch Vint Cerf on The Colbert Report last night?

He talks about Al Gore's involvement in funding the NSFNET, and the need for IPv6 deployments.

Vint handles handles Cobert brilliantly. He literally blows Cobert out of the water in a few places.

July 17, 2014 12:15 AM

July 16, 2014

Everything Sysadmin

Steve Kemp's Blog

So what can I do for Debian?

So I recently announced my intention to rejoin the Debian project, having been a member between 2002 & 2011 (inclusive).

In the past I resigned mostly due to lack of time, and what has changed is that these days I have more free time - primarily because my wife works in accident & emergency and has "funny shifts". This means we spend many days and evenings together, then she might work 8pm-8am for three nights in a row, which then becomes Steve-time, and can involve lots of time browsing reddit, coding obsessively, and watching bad TV (currently watching "Lost Girl". Shades of Buffy/Blood Ties/similar. Not bad, but not great.)

My NM-progress can be tracked here, and once accepted I have a plan for my activities:

  • I will minimally audit every single package running upon any of my personal systems.
  • I will audit as many of the ITP-packages I can manage.
  • I may, or may not, actually package software.

I believe this will be useful, even though there will be limits - I've no patience for PHP and will just ignore it, along with its ecosystem, for example.

As progress today I reported #754899 / CVE-2014-4978 against Rawstudio, and discussed some issues with ITP: tiptop (the program seems semi-expected to be installed setuid(0), but if it is then it will allow arbitrary files to be truncated/overwritten via "tiptop -W /path/to/file"

(ObRandom still waiting for a CVE identifier for #749846/TS-2867..)

And now sleep.

July 16, 2014 08:49 PM

[bc-log]

Using salt-api to integrate SaltStack with other services

Recently I have been looking for ways to allow external tools and services to perform corrective actions across my infrastructure automatically. As an example, I want to allow a monitoring tool to monitor my nginx availability and if for whatever reason nginx is down I want that monitoring tool/service to do something to fix it.

While I was looking at how to implement this, I remembered that SaltStack has an API and that API can provide exactly the functionality I wanted. The below article will walk you through setting up salt-api and configuring it to allow third party services to initiate SaltStack executions.

Install salt-api

The first step to getting started with salt-api is to install it. In this article we will be installing salt-api on a single master server, it is possible to run salt-api on a multi-master setup however you will need to ensure the configuration is consistent across master servers.

Installation of the salt-api package is pretty easy and can be performed with apt-get on Debian/Ubuntu or yum on Red Hat variants.

# apt-get install salt-api

Generate a self signed certificate

By default salt-api utilizes HTTPS and while it is possible to disable this, it is generally not a good idea. In this article we will be utilizing HTTPS with a self signed certificate.

Depending on your intended usage of salt-api you may want to Generate a certificate that is signed by a certificate authority, if you are using salt-api internally only a self signed certificate should be acceptable.

Generate the key

The first step in creating any SSL certificate is to generate a key, the below command will generate a 4096 bit key.

# openssl genrsa -out /etc/ssl/private/key.pem 4096

Sign the key and generate a certificate

After we have generated the key we can generate the certificate with the following command.

# openssl req -new -x509 -key /etc/ssl/private/key.pem -out /etc/ssl/private/cert.pem -days 1826

The -days option allows you to specify how many days this certificate is valid, the above command should generate a certificate that is valid for 5 years.

Create the rest_cherrypy configuration

Now that we have installed salt-api and generated an SSL certificate we can start configuring the salt-api service. In this article we will be placing the salt-api configuration into /etc/salt/master.d/salt-api.conf you can also put this configuration directly in the /etc/salt/master configuration file; however, I prefer putting the configuration in master.d as I believe it allows for easier upkeep and better organization.

# vi /etc/salt/master.d/salt-api.conf

Basic salt-api configuration

Insert

rest_cherrypy:
  port: 8080
  host: 10.0.0.2
  ssl_crt: /etc/ssl/private/cert.pem
  ssl_key: /etc/ssl/private/key.pem

The above configuration enables a very basic salt-api instance, which utilizes SaltStack's external authentication system. This system requires requests to authenticate with user names and passwords or tokens; in reality not every system can or should utilize those methods of authentication.

Webhook salt-api configuration

Most systems and tools however do support utilizing a pre-shared API key for authentication. In this article we are going to use a pre-shared API key to authenticate our systems with salt-api. To do this we will add the webhook_url and webhook_disable_auth parameters.

Insert

rest_cherrypy:
  port: 8080
  host: <your hosts ip>
  ssl_crt: /etc/ssl/private/cert.pem
  ssl_key: /etc/ssl/private/key.pem
  webhook_disable_auth: True
  webhook_url: /hook

The webhook_url configuration parameter tells salt-api to listen for requests to the specified URI. The webhook_disable_auth configuration allows you to disable the external authentication requirement for the webhook URI. At this point in our configuration there is no authentication on that webhook URI, and any request sent to that webhook URI will be posted to SaltStack's event bus.

Actioning webhook requests with Salt's Reactor system

SaltStack's event system is a internal notification system for SaltStack, it allows external processes to listen for SaltStack events and potentially action them. A few examples of events that are published to this event system are Jobs, Authentication requests by minions, as well as salt-cloud related events. A common consumer of these events is the SaltStack Reactor system, the Reactor system allows you to listen for events and perform specified actions when those events are seen.

Reactor configuration

We will be utilizing the Reactor system to listen for webhook events and execute specific commands when we see them. Reactor definitions need to be included in the master configuration, for this article we will define our configuration within the master.d directory.

# vi /etc/salt/master.d/reactor.conf

Insert

reactor:
  - 'salt/netapi/hook/restart':
    - /salt/reactor/restart.sls

The above command will listen for events that are tagged with salt/netapi/hook/restart which would mean any API requests targeting https://example.com:8080/hook/restart. When those events occur it will execute the items within /salt/reactor/restart.sls.

Defining what happens

Within the restart.sls file, we will need to define what we want to happen when the /hook/restart URI is requested.

# vi /salt/reactor/services/restart.sls

Simply restarting a service

To perform our simplest use case of restarting nginx when we get a request to /hook/restart you could add the following.

Insert

restart_services:
  cmd.service.restart:
    - tgt: 'somehost'
    - arg:
      - nginx

This configuration is extremely specific and doesn't leave much chance for someone to exploit it for malicious purposes. This configuration is also extremely limited.

Restarting the specified services on the specified servers

When a salt-api's webhook URL is called the POST data being sent with that request is included in the event message. In the below configuration we will be using that POST data to allow request to the webhook URL to specify both the target servers and the service to be restarted.

Insert

{% set postdata = data.get('post', {}) %}

restart_services:
  cmd.service.restart:
    - tgt: '{{ postdata.tgt }}'
    - arg:
      - {{ postdata.service }}

In the above configuration we are taking the POST data sent to the URL and assigning it to the postdata object. We can then use that object to provide the tgt and arg values. This allows the same URL and webhook call to be used to restart any service on any or all minion nodes.

Since we disabled external authentication on the webhook URL there is currently no authentication with the above configuration. This means that anyone who knows our URL could send a well formatted request and restart any service on any minion they want.

Using a secret key to authenticate

To secure our webhook URL a little better we can add a few lines to our reactor configuration that requires that requests to this reactor include a valid secret key before executing.

Insert

{% set postdata = data.get('post', {}) %}

{% if postdata.secretkey == "replacethiswithsomethingbetter" %}
restart_services:
  cmd.service.restart:
    - tgt: '{{ postdata.tgt }}'
    - arg:
      - {{ postdata.service }}
{% endif %}

The above configuration requires that the requesting service include a POST key of secretkey and the value of that key must be replacethiswithsomethingbetter. If the requester does not, the Reactor will simply perform no steps. The secret key in this configuration should be treated like any other API key, it should be validated before performing any action and if you have more than one system/user making requests you should ensure that only the correct users have the ability to execute the commands specified in the reactor SLS file.

Restarting salt-master and Starting salt-api

Before testing our new salt-api actions we will need to restart the salt-master service and start the salt-api service.

# service salt-master restart
# service salt-api start

Testing our configuration with curl

Once the two above services have finished restarting we can test our configuration with the following curl command.

# curl -H "Accept: application/json" -d tgt='*' -d service="nginx" -d secretkey="replacethiswithsomethingbetter" -k https://salt-api-hostname:8080/hook/services/restart

If the configuration is correct you should expect to see a message stating success.

Example

# curl -H "Accept: application/json" -d tgt='*' -d service="nginx" -d secretkey="replacethiswithsomethingbetter" -k https://10.0.0.2:8080/hook/services/restart
{"success": true}

At this point you can now integrate any third party tool or service that can send webhook requests with your SaltStack implementation. You can also use salt-api to have home grown tools initiate SaltStack executions.

Additional Considerations

Other netapi modules

In this article we implemented salt-api with the cherrypy netapi module, this is one of three options that SaltStack gives you. If you are more familiar with other application servers such as wsgi than I would suggest looking through the documentation to find the appropriate module for your environment.

Restricting salt-api

While in this article we added a secret key for authentication and SSL certificates for encrypted HTTPS traffic; there are additional options that can be used to restrict the salt-api service further. If you are implementing the salt-api service into a non-trusted network, it is a good idea to use tools such as iptables to restrict which hosts / IP's are able to utilize the salt-api service.

In addition when implementing salt-api it is a good idea to think carefully as to what you allow systems to request. A simple example of this would be the cmd.run module within SaltStack. If you allowed a server to perform dynamic cmd.run requests via `salt-api, if a malicious user was able to bypass the implemented restrictions that user would then be able to theoretically run any arbitrary command on your systems.

It is always best to only allow specific commands through the API system and avoid allowing potentially dangerous commands such as cmd.run.


Originally Posted on BenCane.com: Go To Article

by Benjamin Cane at July 16, 2014 05:00 PM