Planet SysAdmin


October 30, 2014

Standalone Sysadmin

Repping for SysAdmins at the Space Coast! #NASASocial

Like most children, I was a huge fan of spaceships when I was little. I remember watching Space Shuttle launches on TV, and watching interviews with astronauts in school. I always wanted to go see a launch in person, but that was hard to do when you were a kid in West Virginia. As I got older, I might have found other interests, but I never lost my love of space, technology, sci-fi, and the merging of all of those things. When I took a year away from system administration, the first hobby I picked up was model rocketry. I didn't really see any other option, it was just natural.

Well, a while back, I saw a post from one of the NASA Social accounts about how they were inviting social media people to come to the initial test launch of the Orion spacecraft in December. I thought... "Hey, I'm a social media people...I should try to get into this!". I don't spend a TON of time talking about space-related activities here, since this is a system administration blog, but I do merge my interests as often as possible, like with my post on instrumentation, or Kerbal Space System Administration, and I understand that I'm not alone in having these two interests. I suspected that, if I were accepted to this program, that it would be of interest to my readers (meaning: you).

Well, this morning, I got the email. I'm accepted. How awesome is that?
(Hint: Very Awesome.)

So, at the beginning of December, I will be heading to Kennedy Space Center to attend a two-day event, where I'll get tours and talk to engineers and administrators, and get very up close and cozy with the space program, and see the Orion launch in person. Literally a lifelong dream. I'm so excited, you've got no idea. Really, you haven't. I'm not even sure it's hit me yet.

The code for this mission is EFT1, for Exploration Test Flight 1. This is the crew module that will take humanity to deep space. This test flight's profile is sending the capsule 5,800km (3,600 miles) into space (the International Space Station orbits at around 330km), then re-enter the atmosphere at 32,000km/h (20,000mph) and be slowed down through friction on its heat shield and 11 parachutes. The entire mission takes 4 hours.

If you follow me on any of my social media accounts, you can prepare to see a lot of space stuff soonish. If you're not interested, I'm sorry about that, and I won't take it personally if you re-follow later in December or next year. But you should stick around, because it's going to be a really fun trip. And I'm going to be blogging here as well, of course.

If you don't already, follow me on twitter, Facebook, Instagram, Flickr, and Google+. You can follow the conversation about this event by using the #NASASocial and #Orion hashtags.

So thank you all for reading my blog, for following me on social media, and for making it possible for me to do awesome things and share them with you. It's because of you that I can do stuff like this, and I'm eternally grateful. If you have any special requests on aspects to cover of this mission, or of my experiences, please comment below and let me know. I can't promise anything, but I can try to make it happen. Thanks again.

by Matt Simmons at October 30, 2014 01:20 PM

Chris Siebenmann

Quick notes on the Linux iptables 'ipset' extension

For a long time Linux's iptables firewall had an annoying lack in that it had no way to do efficient matching against a set of IP addresses. If you had a lot of IP addresses to match things against (for example if you were firewalling hundreds or thousands of IP addresses and IP address ranges off from your SMTP port), you needed one iptables rule for each entry and then they were all checked sequentially. This didn't make your life happy, to put it one way. In modern Linuxes, ipsets are finally the answer to this; they give you support for efficient sets of various things, including random CIDR netblocks.

(This entry suggests that ipsets only appeared in mainline Linux kernels as of 2.6.39. Ubuntu 12.04, 14.04, Fedora 20, and RHEL/CentOS 7 all have them while RHEL 5 appears to be too old.)

To work with ipsets, the first thing you need is the user level tool for creating and manipulating them. For no particularly sensible reason your Linux distribution probably doesn't install this when you install the standard iptables stuff; instead you'll need to install an additional package, usually called ipset. Iptables itself contains the code to use ipsets, but without ipset to create the sets you can't actually install any rules that use them.

(I wish I was kidding about this but I'm not.)

The basic use of ipsets is to make a set, populate it, and match against it. Let's take an example:

ipset create smtpblocks hash:net counters
ipset add smtpblocks 27.112.32.0/19
ipset add smtpblocks 204.8.87.0/24
iptables -A INPUT -p tcp --dport 25 -m set --match-set smtpblocks src -j DROP

(Both entries are currently on the Spamhaus EDROP list.)

Note that the set must exist before you can add iptables rules that refer to it. The ipset manpage has a long discussion of the various types of sets that you can use and the iptables-extensions manpage has a discussion of --match-set and the SET target for adding entries to sets from iptables rules. The hash:net I'm using here holds random CIDR netblocks (including /32s, ie single hosts) and is set to have counters.

It would be nice if there was a simple command to get just a listing of the members of an ipset. Unfortunately there isn't, as plain 'ipset list' insists on outputting a few lines of summary information before it lists the members. Since I don't know if these are constant I'm using 'ipset list -t save | grep "^add "', which seems ugly but seems likely to keep working forever.

Unfortunately I don't think there's an officially supported and documented ipset command for adding multiple entries into a set at once in a single command invocation; instead you're apparently expected to run 'ipset add ...' repeatedly. You can abuse the 'ipset restore' command for this if you want to by creating appropriately formatted input; check the output of 'ipset save' to see what it needs to look like. This may even be considered a stable interface by the ipset authors.

Ipset syntax and usage appears to have changed over time, so old discussions of it that you find online may not work quite as written (and someday these notes may be out of date that way as well).

PS: I can sort of see a lot of clever uses for ipsets, but I've only started exploring them right now and my iptables usage is fairly basic in general. I encourage you to read the ipset manpage and go wild.

Sidebar: how I think you're supposed to use list sets

As an illustrated example:

ipset create spamhaus-drop hash:net counters
ipset create spamhaus-edrop hash:net counters
[... populate both from spamhaus ...]

ipset create spamhaus list:set
ipset add spamhaus spamhaus-drop
ipset add spamhaus spamhaus-edrop

iptables -A INPUT -p tcp --dport 25 -m set --match-set spamhaus src -j DROP

This way your iptables rules can be indifferent about exactly what goes into the 'spamhaus' ipset, although of course this will be slightly less efficient than checking a single merged set.

by cks at October 30, 2014 03:31 AM

October 29, 2014

Steve Kemp's Blog

A brief introduction to freebsd

I've spent the past thirty minutes installing FreeBSD as a KVM guest. This mostly involved fetching the ISO (I chose the latest stable release 10.0), and accepting all the defaults. A pleasant experience.

As I'm running KVM inside screen I wanted to see the boot prompt, etc, via the serial console, which took two distinct steps:

  • Enabling the serial console - which lets boot stuff show up
  • Enabling a login prompt on the serial console in case I screw up the networking.

To configure boot messages to display via the serial console, issue the following command as the superuser:

 # echo 'console="comconsole"' >> /boot/loader.conf

To get a login: prompt you'll want to edit /etc/ttys and change "off" to "on" and "dialup" to "vt100" for the ttyu0 entry. Once you've done that reload init via:

 # kill -HUP 1

Enable remote root logins, if you're brave, or disable PAM and password authentication if you're sensible:

 vi /etc/ssh/sshd_config
 /etc/rc.d/sshd restart

Configure the system to allow binary package-installation - to be honest I was hazy on why this was required, but I ran the two command and it all worked out:

 pkg
 pkg2ng

Now you may install a package via a simple command such as:

 pkg add screen

Removing packages you no longer want is as simple as using the delete option:

 pkg delete curl

You can see installed packages via "pkg info", and there are more options to be found via "pkg help". In the future you can apply updates via:

 pkg update && pkg upgrade

Finally I've installed 10.0-RELEASE which can be upgraded in the future via "freebsd-update" - This seems to boil down to "freebsd-update fetch" and "freebsd-update install" but I'm hazy on that just yet. For the moment you can see your installed version via:

 uname -a ; freebsd-version

Expect my future CPAN releases, etc, to be tested on FreeBSD too now :)

October 29, 2014 06:37 PM

Everything Sysadmin

2015 speaking gigs: Boston, Pennsylvania, Baltimore

Three new speaking gigs have been announced. January (BBLISA in Cambridge, MA), February (Bucks County, PA), and March (Baltimore-area). The full list is on http://the-cloud-book.com/book-tour.html or subscribe to the RSS feed to learn about any new speaking engagements.

The next 3 speaking gigs is always listed on "see us live" box at the top of http://EverythingSysadmin.com.

October 29, 2014 03:00 PM

Standalone Sysadmin

Appearance on @GeekWhisperers Podcast!

I was very happy to visit Austin, TX not long ago to speak at the Spiceworld Austin conference, held by Spiceworks. Conferences like that are a great place to meet awesome people you only talk to on the internet and to see old friends.

Part of the reason I was so excited to go was because John Mark Troyer had asked me if I wanted to take part in the Geek Whisperers Podcast. Who could say no?

With over 60 episodes to their name, Geek Whisperers fills an amazing niche of enterprise solutions, technical expertise, and vendor luminaries that spans every market that technology touches. Hosted by John, Amy "CommsNinja" Lewis, and fellow Bostonite Matt Brender (well, he lives in Cambridge, but that’s close, right?), they have been telling great tales and having a good time doing it for years. I’ve always respected their work, and I was absolutely touched that they wanted to have me on the show.

We met on the Tuesday of the conference and sat around for an hour talking about technology, tribes, and the progression of people and infrastructure. I had such a really good. time, and I hope they did, too.

You can listen to the full podcast on Geek-Whisperers.com or through iTunes or Stitcher.

Please comment below if you have any questions about things we discussed. Thanks for listening!

by Matt Simmons at October 29, 2014 02:23 PM

Chris Siebenmann

Unnoticed nonportability in Bourne shell code (and elsewhere)

In response to my entry on how Bashisms in #!/bin/sh scripts aren't necessarily bugs, FiL wrote:

If you gonna use bashism in your script why don't you make it clear in the header specifying #!/bin/bash instead [of] #!/bin/sh? [...]

One of the historical hard problems for Unix portability is people writing non-portable code without realizing it, and Bourne shell code is no exception. This is true for even well intentioned people writing code that they want to be portable.

One problem, perhaps the root problem, is that very little you do on Unix will come with explicit (non-)portability warnings and you almost never have to go out of your way to use non-portable features. This makes it very hard to know whether or not you're actually writing portable code without trying to run it on multiple environments. The other problem is that it's often both hard to remember and hard to discover what is non-portable versus what is portable. Bourne shell programming is an especially good example of both issues (partly because Bourne shell scripts often use a lot of external commands), but there have been plenty of others in Unix's past (including 'all the world's a VAX' and all sorts of 64-bit portability issues in C code).

So one answer to FiL's question is that a lot of people are using bashisms in their scripts without realizing it, just as a lot of people have historically written non-portable Unix C code without intending to. They think they're writing portable Bourne shell scripts, but because their /bin/sh is Bash and nothing in Bash warns about things the issues sail right by. Then one day you wind up changing /bin/sh to be Dash and all sorts of bits of the world explode, sometimes in really obscure ways.

All of this sounds abstract, so let me give you two examples of accidentally Bashisms I've committed. The first and probably quite common one is using '==' instead of '=' in '[ ... ]' conditions. Many other languages use == as their string equality check, so at some point I slipped and started using it in 'Bourne' shell scripts. Nothing complained, everything worked, and I thought my shell scripts were fine.

The second I just discovered today. Bourne shell pattern matching allows character classes, using the usual '[...]' notation, and it even has negated characters classes. This means that you can write something like the following to see if an argument has any non-number characters in it:

case "$arg" in
   *[^0-9]*) echo contains non-number; exit 1;;
esac

Actually I lied in that code. Official POSIX Bourne shell doesn't negate character classes with the usual '^' character that Unix regular expressions use; instead it uses '!'. But Bash accepts '^' as well. So I wrote code that used '^', tested it, had it work, and again didn't realize that I was non-portable.

(Since having a '^' in your character class is not an error in a POSIX Bourne shell, the failure mode for this one is not a straightforward error.)

This is also a good example of how hard it is to test for non-portability, because even when you use 'set -o posix' Bash still accepts and matches this character class in its way (with '^' interpreted as class negation). The only way to test or find this non-portability is to run the script under a different shell entirely. In fact, the more theoretically POSIX compatible shells you test on the better.

(In theory you could try to have a perfect memory for what is POSIX compliant and not need any testing at all, or cross-check absolutely everything against POSIX and never make a mistake. In practice humans can't do that any more than they can write or check perfect code all the time.)

by cks at October 29, 2014 04:43 AM

October 28, 2014

Everything Sysadmin

Apple Pay and CurrentC

I predict one year from today CurrentC won't be up and running and, in fact, history will show it was just another attempt to stall and prevent any kind of mobile payment system in the U.S. from being a success. I'm not saying that there won't be NFC payment systems, just that they'll be marginalized and virtually usess as a result.

October 28, 2014 09:18 PM

Chris Siebenmann

My current somewhat tangled feelings on operator.attrgetter

In a comment on my recent entry on sort comparison functions, Peter Donis asked a good question:

Is there a reason you're not using operator.attrgetter for the key functions? It's faster than a lambda.

One answer is that until now I hadn't heard of operator.attrgetter. Now that I have it's something I'll probably consider in the future.

But another answer is embedded in the reason Peter Donis gave for using it. Using operator.attrgetter is clearly a speed optimization, but speed isn't always the important thing. Sometimes, even often, the most important thing to optimize is clarity. Right now, for me attrgetter is less clear than the lambda approach because I've just learned about it; switching to it would probably be a premature optimization for speed at the cost of clarity.

In general, well, 'attrgetter' is a clear enough thing that I suspect I'll never be confused about what 'lst.sort(key=operator.attrgetter("field"))' does, even if I forget about it and then reread some code that uses it; it's just pretty obvious from context and the name itself. There's a visceral bit of me that doesn't like it as much as the lambda approach because I don't think it reads as well, though. It's also more black magic than lambda, since lambda is a general language construct and attrgetter is a magic module function.

(And as a petty thing it has less natural white space. I like white space since it makes things more readable.)

On the whole this doesn't leave me inclined to switch to using attrgetter for anything except performance sensitive code (which these sort()s aren't so far). Maybe this is the wrong decision, and if the Python community as a whole adopts attrgetter as the standard and usual way to do .sort() key access it certainly will become a wrong decision. At that point I hope I'll notice and switch myself.

(This is an sense an uncomfortable legacy of CPython's historical performance issues with Python code. Attrgetter is clearly a performance hack in general; if lambda was just as fast as it I'd argue that you should clearly use lambda because it's a general language feature instead of a narrowly specialized one.)

by cks at October 28, 2014 04:12 AM

October 27, 2014

Everything Sysadmin

Wait, did you mean Wed the 15th or Thu the 16th?

How many times have you seen this happen?

Email goes out that mentioned a date like "Wed, Oct 16". Since Oct 16 is a Thursday, not a Wednesday (this year), there is a flurry of email asking, "Did you mean Wed the 15th or Thu the 16th?" A correction goes out but the damage is done. Someone invariantly "misses the update" and shows up a day early or late, or is otherwise inconvenienced. Either way cognitive processing is wasted for anyone involved.

The obvious solution is "people should proofread better" but it is a mistake that everyone makes. I see the mistake at least once a month, and sometimes I'm the guilty party.

If someone could solve this problem it would be a big win.

Google's gmail will warn you if you use the word "attachment" and don't attach a file. Text editing boxes in all modern web browsers and operating systems have some kind of live spell-check that put a red mark under a word that is misspelled. Some do real-time grammar checking too.

How hard would it be to add a check for "Wed, Oct 16" and similar errors? Yes, there are many date formats, and in some cases one would have to guess the year.

It would also be nice if we could write "FILL, Oct 16" and the editor would fill in the day of the week. Or a context-sensitive menu (i.e. the left click menu) would offer to add the day of the week for you. If the time is included, it should offer to link to timeanddate.com.

Ok Gmail, Chrome, Apple and Microsoft: Who's going to be the first to implement this?

October 27, 2014 03:00 PM

Chris Siebenmann

Practical security and automatic updates

One of the most important contributors to practical, real world security is automatically applied updates. This is because most people will not take action to apply security fixes; in fact most people will probably not do so even if asked directly and just required to click 'yes, go ahead'. The more work people have to go through to apply security fixes, the fewer people will do so. Ergo you maximize security fixes when people are required to take no action at all.

(Please note that sysadmins and developers are highly atypical users.)

But this relies on users being willing to automatically apply updates, and that in turn requires that updates must be harmless. The ideal update either changes nothing besides fixing security issues and other bugs or improves the user's life. Updates that complicate the user's life at the same time that they deliver security fixes, like Firefox updates, are relatively bad. Updates that actually harm the user's system are terrible.

Every update that does harm to someone's system is another impetus for people to disable automatic updates. It doesn't matter that most updates are harmless and it doesn't matter that most people aren't affected by even the harmful updates, because bad news is much more powerful than good news. We hear loudly about every update that has problems; we very rarely hear about updates that prevented problems, partly because it's hard to notice when it happens.

(The other really important thing to understand is that mythology is extremely powerful and extremely hard to dislodge. Once mythology has set in that leaving automatic updates on is a good way to get screwed, you have basically lost; you can expect to spend huge amounts of time and effort persuading people otherwise.)

If accidentally harmful updates are bad, actively malicious updates are worse. An automatic update system that allows malicious updates (whether the maliciousness is the removal of features or something worse) is one that destroys trust in it and therefor destroys practical security. As a result, malicious updates demand an extremely strong and immediate response. Sadly they often don't receive one, and especially when the 'update' removes features it's often even defended as a perfectly okay thing. It's not.

PS: corollaries for, say, Firefox and Chrome updates are left as an exercise to the reader. Bear in mind that for many people their web browser is one of the most crucial parts of their computer.

(This issue is why people are so angry about FTDI's malicious driver appearing in Windows Update (and FTDI has not retracted their actions; they promise future driver updates that are almost as malicious as this one). It's also part of why I get so angry when Unix vendors fumble updates.)

by cks at October 27, 2014 05:42 AM

October 26, 2014

Security Monkey

Recovering Data from FAT Filesystems using TestDisk on Linux

This seems to be a pretty popular topic lately, and a recent ping from a good friend on IRC got me thinking that perhaps a blog post was warranted.

 

There are more "file recovery" utilities available for download now than ever before, and they are all of varying quality to some extent.  If you're on a Windows platform and need to quickly recover some files that you deleted from that handy-dandy thumb drive, you have a plethoria of options like UnEraser, Active Undel

October 26, 2014 07:36 PM

Chris Siebenmann

Things that can happen when (and as) your ZFS pool fills up

There's a shortage of authoritative information on what actually happens if you fill up a ZFS pool, so here is what I've both gathered about it from other people's information and experienced.

The most often cited problem is bad performance, with the usual cause being ZFS needing to do an increasing amount of searching through ZFS metaslab space maps to find free space. If not all of these are in memory, a write may require pulling some or all of them into memory, searching through them, and perhaps finding not enough space. People cite various fullness thresholds for this starting to happen, eg anywhere from 70% full to 90% full. I haven't seen any discussion about how severe this performance impact is supposed to be (and on what sort of vdevs; raidz vdevs may behave differently than mirror vdevs here).

(How many metaslabs you have turns out to depend on how your pool was created and grown.)

A nearly full pool can also have (and lead to) fragmentation, where the free space is in small scattered chunks instead of large contiguous runs. This can lead to ZFS having to write 'gang blocks', which are a mechanism where ZFS fragments one large logical block into smaller chunks (see eg the mention of them in this entry and this discussion which corrects some bits). Gang blocks are apparently less efficient than regular writes, especially if there's a churn of creation and deletion of them, and they add extra space overhead (which can thus eat your remaining space faster than expected).

If a pool gets sufficiently full, you stop being able to change most filesystem properties; for example, to set or modify the mountpoint or change NFS exporting. In theory it's not supposed to be possible for user writes to fill up a pool that far. In practice all of our full pools here have resulted in being unable to make such property changes (which can be a real problem under some circumstances).

You are supposed to be able to remove files from a full pool (possibly barring snapshots), but we've also had reports from users that they couldn't do so and their deletion attempt failed with 'No space left on device' errors. I have not been able to reproduce this and the problem has always gone away on its own.

(This may be due to a known and recently fixed issue, Illumos bug #4950.)

I've never read reports of catastrophic NFS performance problems for all pools or total system lockup resulting from a full pool on an NFS fileserver. However both of these have happened to us. The terrible performance issue only happened on our old Solaris 10 update 8 fileservers; the total NFS stalls and then system lockups have now happened on both our old fileservers and our new OmniOS based fileservers.

(Actually let me correct that; I've seen one report of a full pool killing a modern system. In general, see all of the replies to my tweeted question.)

By the way: if you know of other issues with full or nearly full ZFS pools (or if you have additional information here in general), I'd love to know more. Please feel free to leave a comment or otherwise get in touch.

by cks at October 26, 2014 05:36 AM

Raymii.org

Keepalived notify script, execute action on failover

Keepalived supports running scripts on VRRP state change. This can come in handy when you need to execute an action when a failover occurs. In my case, I have a VPN running on a Virtual IP and want to make sure the VPN only runs on the node with the Virtual IP.

October 26, 2014 12:00 AM

October 25, 2014

That grumpy BSD guy

The Book of PF, 3rd Edition is Here, First Signed Copy Can Be Yours

Continuing the tradition started by Michael Lucas with the Absolute OpenBSD, 2nd edition auction, I will be auctioning off the first signed copy of the Book of PF, 3rd edition.

Updated - the ebay auction is live - see below
 
Today I took delivery of two boxes full of my The Book of PF, 3rd edition author copies. They are likely the first to arrive in Norway as well (a few North Americans received their copies early last week), but of course this is somewhere in the range hard to impossible to verify.

Anyway, here is the long anticipated with book selfie:


(larger size available here)

The writing process and the subsequent editing and proofing steps that you, dear reader, will know to appreciate took significantly longer than I had expected, but this edition of the book has the good luck to become available just before the release of OpenBSD that it targets. My original plan was to be in sync with the OpenBSD 5.5 release, but to nobody's surprise but mine the process took longer than I had wanted it to.

As regular readers will know already, the main reason this edition exists is that from OpenBSD 5.5 on, we have a new traffic shaping system to replace the more than 15 years old experimental ALTQ code. The book is up to date with OpenBSD 5.6 (early preorderers have received their disks already, I hear) and while it gives some hints on how to migrate to the new queues and priorities system, it also notes that ALTQ is no longer part of OpenBSD as of version 5.6.

And of course there have been various improvements in OpenBSD since 2010 and version 4.8, which were the year and version referenced in the second edition. You will see updates reflecting at least some of those changes in various parts of the book.

Even if you're not on OpenBSD at all, this edition is an improvement over previous versions, we've taken some care to include information relevant to FreeBSD and NetBSD as well, and where there are significant differences between the systems, it's noted in the text and examples.

It could have been tempting to include specific references to Apple's operating system as well, but I made a decision early on to stay with the free systems. I have written something about PF and Apple, but not in the book -- see my Call for Testing article How Apple Treats The Gift Of Open Source: The OpenBSD PF Example for a few field notes.

But now for the main item. For this edition, for a limited time only, there will be a

Book of PF Auction

You have a chance to own the first author signed copy of The Book of PF, 3rd edition.

The auction is up at http://www.ebay.com/itm/The-Book-of-PF-3rd-ed-signed-by-the-author-First-Copy-signed-/321563281902? - I'll look into extending the auction period, for some odd reason the max offered was 10 days. If your bid is not the successful one, I strongly urge you to make a direct donation of the same amount to the OpenBSD Foundation instead.

I've signed the book, and will fill in the missing spaces once we have the name and amount:




UPDATE 2014-10-26 01:00 CEST: Whatever it was that stopped ebay from listing the auction was resolved. The auction is up at http://www.ebay.com/itm/The-Book-of-PF-3rd-ed-signed-by-the-author-First-Copy-signed-/321563281902? - I'll look into extending the auction period, for some odd reason the max offered was 10 days. If your bid is not the successful one, I strongly urge you to make a direct donation of the same amount.to the OpenBSD foundation instead.

The first signed copy, and incidentally also the first copy my wife picked out of the first box we opened, will come with this inscribed in my handwriting on the title page:

FOR (your name)
Winner of the 2014 Book of PF Auction
Thank you for Supporting OpenBSD with your
(CAD, USD or EUR amount) donation

Bergen, (date), (my signature)

That's just for your reference. My handwriting is not a pretty sight at the best of times, and when you, the lucky winner, receive the book it's entirely reasonable that you will not be able to dechipher the scrawls at all.

If you think your chances of actually winning are not worth considering, please head over to the OpenBSD donations or orders page and spend some of your (or your boss') hard earned cash!

My speaking schedule has not been set for the upcoming months, but there is a reasonable chance I'll attend at least a few BSD events in the near future. See you there!

by noreply@blogger.com (Peter N. M. Hansteen) at October 25, 2014 05:42 PM

Chris Siebenmann

The difference in available pool space between zfs list and zpool list

For a while I've noticed that 'zpool list' would report that our pools had more available space than 'zfs list' did and I've vaguely wondered about why. We recently had a very serious issue due to a pool filling up, so suddenly I became very interested in the whole issue and did some digging. It turns out that there are two sources of the difference depending on how your vdevs are set up.

For raidz vdevs, the simple version is that 'zpool list' reports more or less the raw disk space before the raidz overhead while 'zfs list' applies the standard estimate that you expect (ie that N disks worth of space will vanish for a raidz level of N). Given that raidz overhead is variable in ZFS, it's easy to see why the two commands are behaving this way.

In addition, in general ZFS reserves a certain amount of pool space for various reasons, for example so that you can remove files even when the pool is 'full' (since ZFS is a copy on write system, removing files requires some new space to record the changes). This space is sometimes called 'slop space'. According to the code this reservation is 1/32nd of the pool's size. In my actual experimentation on our OmniOS fileservers this appears to be roughly 1/64th of the pool and definitely not 1/32nd of it, and I don't know why we're seeing this difference.

(I found out all of this from a Ben Rockwood blog entry and then found the code in the current Illumos codebase to see what the current state was (or is).)

The actual situation with what operations can (or should) use what space is complicated. Roughly speaking, user level writes and ZFS operations like 'zfs create' and 'zfs snapshot' that make things should use the 1/32nd reserved space figure, file removes and 'neutral' ZFS operations should be allowed to use half of the slop space (running the pool down to 1/64th of its size), and some operations (like 'zfs destroy') have no limit whatever and can theoretically run your pool permanently and unrecoverably out of space.

The final authority is the Illumos kernel code and its comments. These days it's on Github so I can just link to the two most relevant bits: spa_misc.c's discussion of spa_slop_shift and dsl_synctask.h's discussion of zfs_space_check_t.

(What I'm seeing with our pools would make sense if everything was actually being classified as a 'allowed to use half of the slop space' operation. I haven't traced the Illumos kernel code at this level so I have no idea how this could be happening; the comments certainly suggest that it isn't supposed to be.)

(This is the kind of thing that I write down so I can find it later, even though it's theoretically out there on the Internet already. Re-finding things on the Internet can be a hard problem.)

by cks at October 25, 2014 06:06 AM

RISKS Digest

October 24, 2014

Everything Sysadmin

How to make change when handed a $20... and help democracy

If someone owes you $5.35 and hands you a $20 bill, every reader of this blog can easily make change. You have a calculator, a cash register, or you do it in your head.

However there is a faster way that I learned when I was 12.

Today it is rare to get home delivery of a newspaper, but if you do, you probably pay by credit card directly to the newspaper company. It wasn't always like that. When I was 12 years old I delivered newspapers for The Daily Record. Back then payments were collected by visiting each house every other week. While I did eventually switch to leaving envelopes for people to leave payments for me, there was a year or so where I visited each house and collected payment directly.

Let's suppose someone owed me $5.35 and handed me a $20 bill. Doing math in real time is slow and error prone, especially if you are 12 years old and tired from lugging newspapers around.

Instead of thinking in terms of $20 minus $5.35, think in terms of equilibrium. They are handing you $20 and you need to hand back $20... the $5.35 in newspapers they've received plus the change that will total $20 and reach equilibrium.

So you basically count starting at $5.35. You say outloud, "5.35" then hand them a nickel and say "plus 5 makes 5.40". Next you hand them a dime and say "plus 10 makes 5.50". Now you can hand them 50 cents, and say "plus 50 cents makes 6". Getting from 6 to 20 is a matter of handing them 4 singles and counting out loud "7, 8, 9, and 10" as you hand them each single. Next you hand them 10 and say "and 10 makes 20".

Notice that the complexity of subtraction has been replaced by counting, which is much easier. This technique is less prone to error, and makes it easier for the customer to verify what you are doing in real time because they see what you are doing along the way. It is more transparent.

Buy a hotdog from a street vendor and you'll see them do the same thing. It may cost $3, and they'll count starting at 3 as they hand you bills, "3..., 4, 5, and 5 is 10, and 10 is 20."

I'm sure that a lot of people reading this blog are thinking, "But subtraction is so easy!" Well, it is but this is easiER and less error prone. There are plenty of things you could do the hard way and I hope you don't.

It is an important life skill to be able to do math without a calculator and this is one of the most useful tricks I know.

So why is this so important that I'm writing about it on my blog?

There are a number of memes going around right now that claim the Common Core curriculum standards in the U.S. are teaching math "wrong". They generally show a math homework assignment like 20-5.35 as being marked "wrong" because the student wrote 14.65 instead of .05+.10+.50+4+10.

What these memes aren't telling you is they are based on a misunderstanding of the Common Core requirements. The requirement is that students are to be taught both ways and that the "new way" is such that that they can do math without a calculator. It is important that, at a young age, children learn that there are multiple equivalent ways of getting the same answer in math. The multi-connectedness of mathematics is an important concept, much more important than the rote memorization of addition and multiplication tables.

If you've ever mocked the way people are being trained to "stop thinking and just press buttons on a cash register" then you should look at this "new math" as a way to turn that around. If not, what do you propose? Not teaching them to think about math in higher terms?

In the 1960s there was the "new math" movement, which was mocked extensively. However if you look at what "new math" was trying to do: it was trying to prepare students for the mathematics required for the space age where engineering and computer science would be primary occupations. I think readers of this blog should agree that is a good goal.

One of the 1960s "new math" ideas that was mocked was that it tried to teach Base 8 math in addition to normal Base 10. This was called "crazy" at the time. It wasn't crazy at all. It was recognized by educators that computers were going to be a big deal in the future (correct) and to be a software developer you needed to understand binary and octal (mostly correct) or at least have an appreciation for them (absolutely correct). History has proven they naysayers to be wrong.

When I was in 5th grade (1978-9) my teacher taught us base 8, 2 and 12. He told us this was not part of the curriculum but he felt it was important. He was basically teaching us "new math" even though it was no longer part of the curriculum. Later when I was learning about computers the concept of binary and hexadecimal didn't phase me because I had already been exposed to other bases. While other computer science students were struggling, I had an advantage because I had been exposed to these strange base systems.

One of these anti-Common Core memes includes note from a father who claims he has a Bachelor of Science Degree in Electronics Engineering which included an extensive study of differential equations and even he is unable to explain the Common Core. Well, he must be a terrible engineer since the question was not about doing the math, but to find the off-by-one error in the diagram. To quote someone on G+, "The supposed engineer must suck at his work if he can't follow the process, debug each step, and find the off-by-one error."

Beyond the educational value or non-value of Common Core, what really burns my butt is the fact that all these memes come from one of 3 sources:

  • Organizations that criticize anything related to public education while at the same time they criticize any attempt to improve it. You can't have it both ways.
  • Organizations who just criticise anything Obama is for, to the extent that if Obama changes his mind they flip and reverse their position too.
  • Organizations backed by companies that either benefit from ignorance, or profit from the privatization of education. This is blatant and cynical.

Respected computer scientist, security guru, and social commentator Gene "Spaf" Spafford recently blogged "There is an undeniable, politically-supported growth of denial -- and even hatred -- of learning, facts, and the educated. Greed (and, most likely, fear of minorities) feeds demagoguery. Demagoguery can lead to harmful policies and thereafter to mob actions."

These math memes are part of that problem.

A democracy only works if the populace is educated. Education makes democracy work. Ignorance robs us of freedom because it permits us to be controlled by fear. Education gives us economic opportunities and jobs, which permit us to maintain our freedom to move up in social strata. Ignorance robs people of the freedom to have economic mobility. The best way we can show our love for our fellow citizens, and all people, is to ensure that everyone receives the education they need to do well today and in the future. However it is not just about love. There is nothing more greedy you can do than to make sure everyone is highly educated because it grows the economy and protects your own freedom too.

Sadly, Snopes and skeptics.stackexchange.com can only do so much. Fundamentally we need much bigger solution.

October 24, 2014 03:00 PM

Standalone Sysadmin

Accidental DoS during an intentional DoS

Funny, I remember always liking DOS as a kid...

Anyway, on Tuesday, I took a day off, but ended up getting a call at home from my boss at 4:30pm or so. We were apparently causing a DoS attack, he said, and the upstream university had disabled our net connection. He was trying to conference in the central network (ITS) admins so we could figure out what was going on.

I sat down at my computer and was able to connect to my desktop at work, so the entire network wasn't shut down. It looked like what they had done was actually turn off out-bound DNS, which made me suspect that one of the machines on our network was performing a DOS as a kid...

Anyway, on Tuesday, I took a day off, but ended up getting a call at home from my boss at 4:30pm or so. We were apparently causing a DoS attack, he said, and the upstream university had disabled our net connection. He was trying to conference in the central network (ITS) admins so we could figure out what was going on.

I sat down at my computer and was able to connect to my desktop at work, so the entire network wasn't shut down. It looked like what they had done was actually turn off out-bound DNS, which made me suspect that one of the machines on our network was performing a DOS as a kid...

Anyway, on Tuesday, I took a day off, but ended up getting a call at home from my boss at 4:30pm or so. We were apparently causing a DoS attack, he said, and the upstream university had disabled our net connection. He was trying to conference in the central network (ITS) admins so we could figure out what was going on.

I sat down at my computer and was able to connect to my desktop at work, so the entire network wasn't shut down. It looked like what they had done was actually turn off out-bound DNS, which made me suspect that one of the machines on our network was performing a DOS as a kid...

Anyway, on Tuesday, I took a day off, but ended up getting a call at home from my boss at 4:30pm or so. We were apparently causing a DoS attack, he said, and the upstream university had disabled our net connection. He was trying to conference in the central network (ITS) admins so we could figure out what was going on.

I sat down at my computer and was able to connect to my desktop at work, so the entire network wasn't shut down. It looked like what they had done was actually turn off out-bound DNS, which made me suspect that one of the machines on our network was performing a DNS reflection attack, but this was just a sign of my not thinking straight. If that had been the case, they would have shut down inbound DNS rather than outbound.

After talking with them, they saw that something on our network had been initiating a denial of service attack on DNS servers using hundreds of spoofed source IPs. Looking at graphite for that time, I suspect you'll agree when I say, "yep":

Initially, the malware was spoofing IPs from all kinds of IP ranges, not just things in our block. As it turns out, I didn't have the sanity check on my egress ACLs on my gateway that said, "nothing leaves that isn't in our IP block", which is my bad. As soon as I added that, a lot of the traffic died. Unfortunately, because the university uses private IP space in the 10.x.x.x range, I couldn't block that outbound. And, of course, the malware quickly caught up to speed and started exclusively using 10.x addresses to spoof from. So we got shut down again.

Over the course of a day, here's what the graph looked like:

Now, on the other side of the coin, I'm sure you're screaming "SHUT DOWN THE STUPID MACHINE DOING THIS", because I was too. The problem was that I couldn't find it. Mostly because of my own ineptitude, as we'll see.

Alright, it's clear from the graph above that there were some significant bits being thrown around. That should be easy to track. So, lets fire up graphite and figure out what's up.

Most of my really useful graphs are thanks to the ironically named Unhelpful Graphite Tip #6, where Jason Dixon describes the "mostDeviant" function, which is pure awesome. The idea is that, if you have a BUNCH of metrics, you probably can't see much useful information because there are so many lines. So instead, you probably want the few weirdest metrics out of that collection, and that's what you get. Here's how it works.

In the graphite box, set the time frame that you're looking for:

Then add the graph data that you're looking for. Wildcards are super-useful here. Since the uplink graph above is a lot of traffic going out of the switch (tx), I'm going to be looking for a lot of data coming into the switch (rx). The metric that I'll use is:


CCIS.systems.linux.Core*.snmp.if_octets-Ethernet*.rx

That metric, by itself, looks like this:

There's CLEARLY a lot going on there. So we'll apply the mostDeviant filter:

and we'll select the top 4 metrics. At this point, the metric line looks like this:


mostDeviant(4,CCIS.systems.linux.Core*.snmp.if_octets-Ethernet*.rx)

and the graph is much more manageable:

Plus, most usefully, now I have port numbers to investigate. Back to the hunt!

As it turns out, those two ports are running to...another switch. An old switch that isn't being used by more than a couple dozen hosts. It's destined for the scrap heap, and because of that, when I was setting up collectd to monitor the switches using the snmp plugin, I neglected to add this switch. You know, because I'm an idiot.

So, I quickly modified the collectd config and pushed the change up to the puppet server, then refreshed the puppet agent on the host that does snmp monitoring and started collecting metrics. Except that, at the moment, the attack had stopped...so it was a waiting game that might never actually happen again. As luck would have it, the attack started again, and I was able to trace it to a port:

Gotcha!

(notice how we actually WERE under attack when I started collecting metrics? It was just so tiny compared to the full on attack that we thought it might have been normal baseline behavior. Oops)

So, checking that port led to...a VM host. And again, I encountered a road block.

I've been having an issue with some of my VMware ESXi boxes where they will encounter occasional extreme disk latency and fall out of the cluster. There are a couple of knowledgebase articles ([1] [2]) that sort-of kind-of match the issue, but not entirely. In any event, I haven't ironed it out. The VMs are fine during the disconnected phase, and the fix is to restart the management agents through the console, which I was able to do and then I could manage the host again.

Once I could get a look, I could see that there wasn't a lot on that machine - around half a dozen VMs. Unfortunately, because the host had been disconnected from the vCenter Server, stats weren't being collected on the VMs, so we had to wait a little bit to figure out which one it was. But we finally did.

In the end, the culprit was a NetApp Balance appliance. There's even a knowledge base article on it being vulnerable to ShellShock. Oops. And why was that machine even available to the internet at large? Double oops.

I've snapshotted that machine and paused it. We'll probably have some of the infosec researchers do forensics on it, if they're interested, but that particular host wasn't even being used. VM cruft is real, folks.

Now, back to the actual problem...

The network uplink to the central network happens over a pair of 10Gb/s fiber links. According to the graph, you can see that the VM was pushing 100MB (800Mb/s). This is clearly Bad(tm), but it's not world-ending bad for the network, right? Right. Except...

Upstream of us, we are going through an in-line firewall (that, like OUR equipment, was not set to filter egress traffic based on spoofed source IPs - oops, but not for me, finally!). We are assigned to one of five virtual firewalls on that one physical piece of hardware...despite that, the actual physical piece of hardware has a limit of around a couple hundred thousand concurrent sessions.

For a network this size, that is probably(?) reasonable, but a session counts as a stream of packets between a source IP and a destination IP. Every time you change the source IP, you get a new session, and when you spoof thousands of source IPs...guess what? And since it's a per-physical-device limit, our one rogue VM managed to take out the resources of the big giant firewall.

In essence, this one intentional DoS attack on a couple of hosts in China successfully DoS'd our university as sheer collateral damage. Oops.

So, we're working on ways to fix things. A relatively simple step is to prevent egress traffic from IPs that aren't our own. This is done now. We've also been told that we need to block egress DNS traffic, except from known hosts or to public DNS servers. This is in place, but I really question its efficacy. So we're blocking DNS. There are a lot of other protocols that use UDP, too. NTP reflection attacks are a thing. Anyway, we're now blocking egress DNS and I've had to special-case a half-dozen research projects, but that's fine by me.

In terms of things that will make an actual difference, we're going to be re-evaluating the policies in place for putting VMs on publicly-accessible networks, and I think it's likely that there will need to be justification for providing external access to new resources, whereas in the past, it's just been the default to leave things open because we're a college, and that's what we do, I guess. I've never been a fan of that, from a security perspective, so I'm glad it's likely to change now.

So anyway, that's how my week has been. Fortunately, it's Friday, and my sign is up to "It has been [3] Days without a Network Apocalypse".

by Matt Simmons at October 24, 2014 09:26 AM

Chris Siebenmann

In Go I've given up and I'm now using standard packages

In my Go programming, I've come around to an attitude that I'll summarize as 'there's no point in fighting city hall'. What this means is that I'm now consciously using standard packages that I don't particularly like just because they are the standard packages.

I'm on record as disliking the standard flag package, for example, and while I still believe in my reasons for this I've decided that it's simply not worth going out of my way over it. The flag package works and it's there. Similarly, I don't think that the log package is necessarily a great solution for emitting messages from Unix style command line utilities but in my latest Go program I used it anyways. It was there and it wasn't worth the effort to code warn() and die() functions and so on.

Besides, using flag and log is standard Go practice so it's going to be both familiar to and expected by anyone who might look at my code someday. There's a definite social benefit to doing things the standard way for anything that I put out in public, much like most everyone uses gofmt on their code.

In theory I could find and use some alternate getopt package (these days the go to place to find one would be godoc.org). In practice I find using external packages too much of a hassle unless I really need them. This is an odd thing to say about Go, considering that it makes them so easy and accessible, but depending on external packages comes with a whole set of hassles and concerns right now. I've seen a bit too much breakage to want that headache without a good reason.

(This may not be a rational view for Go programming, given that Go deliberately makes using people's packages so easy. Perhaps I should throw myself into using lots of packages just to get acclimatized to it. And in practice I suspect most packages don't break or vanish.)

PS: note that this is different from the people who say you should eg use the testing package for your testing because you don't really need anything more than what it provides and stick with the standard library's HTTP stuff rather than getting a framework. As mentioned, I still think that flag is not the right answer; it's just not wrong enough to be worth fighting city hall over.

Sidebar: Doing standard Unix error and warning messages with log

Here's what I do:

log.SetPrefix("<progname>: ")
log.SetFlags(0)

If I was doing this better I would derive the program name from os.Args[0] instead of hard-coding it, but if I did that I'd have to worry about various special cases and no, I'm being lazy here.

by cks at October 24, 2014 05:16 AM

RISKS Digest