Planet SysAdmin


October 25, 2014

That grumpy BSD guy

The Book of PF, 3rd Edition is Here, First Signed Copy Can Be Yours

Continuing the tradition started by Michael Lucas with the Absolute OpenBSD, 2nd edition auction, I will be auctioning off the first signed copy of the Book of PF, 3rd edition.
 
Today I took delivery of two boxes full of my The Book of PF, 3rd edition author copies. They are likely the first to arrive in Norway as well (a few North Americans received their copies early last week), but of course this is somewhere in the range hard to impossible to verify.

Anyway, here is the long anticipated with book selfie:


(larger size available here)

The writing process and the subsequent editing and proofing steps that you, dear reader, will know to appreciate took significantly longer than I had expected, but this edition of the book has the good luck to become available just before the release of OpenBSD that it targets. My original plan was to be in sync with the OpenBSD 5.5 release, but to nobody's surprise but mine the process took longer than I had wanted it to.

As regular readers will know already, the main reason this edition exists is that from OpenBSD 5.5 on, we have a new traffic shaping system to replace the more than 15 years old experimental ALTQ code. The book is up to date with OpenBSD 5.6 (early preorderers have received their disks already, I hear) and while it gives some hints on how to migrate to the new queues and priorties system, it also notes that ALTQ is no longer part of OpenBSD as of version 5.6.

And of course there have been various improvements in OpenBSD since 2010 and version 4.8, which were the year and version referenced in the second edition. You will see updates reflecting at least some of those changes in various parts of the book.

Even if you're not on OpenBSD at all, this edition is an improvement over previous versions, we've taken some care to include information relevant to FreeBSD and NetBSD as well, and where there are significant differences between the systems, it's noted in the text and examples.

It could have been tempting to include specific references to Apple's operating system as well, but I made a decision early on to stay with the free systems. I have written something about PF and Apple, but not in the book -- see my Call for Testing article How Apple Treats The Gift Of Open Source: The OpenBSD PF Example for a few field notes.

But now for the main item. For this edition, for a limited time only, there will be a

Book of PF Auction

You have a chance to own the first author signed copy of The Book of PF, 3rd edition. To enter, you need only to make a donation to the OpenBSD Foundation during the next month.

Here are the details:

I will give the first author signed copy of The Book of PF, third edition to whoever sends the largest donation marked "Book of PF Auction" to the OpenBSD Foundation by November 25th, 2014.

Please note that for this auction, only online donations such as PayPal or Bitcoin will be accepted, and the Foundation will not provide invoices or the regular paperwork for individual transfers that form part of this auction (I think they prefer to treat the auction as one donation, but I could be mistaken in that detail). And of course, make sure you give a valid mailing address.

As soon as practical after that date, the Foundation notifies the winner and me, we publish the winning amount along with total amount raised, and if the winner agrees, the winner's name. I will mail the hopefully well preserved first signed copy to the winner as soon as I have their mailing address.

The first signed copy, and incidentally also the first copy my wife picked out of the first box we opened, will come with this inscribed in my handwriting on the title page:

FOR (your name)
Winner of the 2014 Book of PF Auction
Thank you for Supporting OpenBSD with your
(CAD, USD or EUR amount) donation

Bergen, (date), (my signature)

That's just for your reference. My handwriting is not a pretty sight at the best of times, and when you, the lucky winner, receive the book it's entirely reasonable that you will not be able to dechipher the scrawls at all.

Now go on, donate to enter!

If you think your chances of actually winning are not worth considering, please head over to the OpenBSD donations or orders page and spend some of your (or your boss') hard earned cash!

My speaking schedule has not been set for the upcoming months, but there is a reasonable chance I'll attend at least a few BSD events in the near future. See you there!

by noreply@blogger.com (Peter N. M. Hansteen) at October 25, 2014 05:42 PM

Chris Siebenmann

The difference in available pool space between zfs list and zpool list

For a while I've noticed that 'zpool list' would report that our pools had more available space than 'zfs list' did and I've vaguely wondered about why. We recently had a very serious issue due to a pool filling up, so suddenly I became very interested in the whole issue and did some digging. It turns out that there are two sources of the difference depending on how your vdevs are set up.

For raidz vdevs, the simple version is that 'zpool list' reports more or less the raw disk space before the raidz overhead while 'zfs list' applies the standard estimate that you expect (ie that N disks worth of space will vanish for a raidz level of N). Given that raidz overhead is variable in ZFS, it's easy to see why the two commands are behaving this way.

In addition, in general ZFS reserves a certain amount of pool space for various reasons, for example so that you can remove files even when the pool is 'full' (since ZFS is a copy on write system, removing files requires some new space to record the changes). This space is sometimes called 'slop space'. According to the code this reservation is 1/32nd of the pool's size. In my actual experimentation on our OmniOS fileservers this appears to be roughly 1/64th of the pool and definitely not 1/32nd of it, and I don't know why we're seeing this difference.

(I found out all of this from a Ben Rockwood blog entry and then found the code in the current Illumos codebase to see what the current state was (or is).)

The actual situation with what operations can (or should) use what space is complicated. Roughly speaking, user level writes and ZFS operations like 'zfs create' and 'zfs snapshot' that make things should use the 1/32nd reserved space figure, file removes and 'neutral' ZFS operations should be allowed to use half of the slop space (running the pool down to 1/64th of its size), and some operations (like 'zfs destroy') have no limit whatever and can theoretically run your pool permanently and unrecoverably out of space.

The final authority is the Illumos kernel code and its comments. These days it's on Github so I can just link to the two most relevant bits: spa_misc.c's discussion of spa_slop_shift and dsl_synctask.h's discussion of zfs_space_check_t.

(What I'm seeing with our pools would make sense if everything was actually being classified as a 'allowed to use half of the slop space' operation. I haven't traced the Illumos kernel code at this level so I have no idea how this could be happening; the comments certainly suggest that it isn't supposed to be.)

(This is the kind of thing that I write down so I can find it later, even though it's theoretically out there on the Internet already. Re-finding things on the Internet can be a hard problem.)

by cks at October 25, 2014 06:06 AM

RISKS Digest

October 24, 2014

Everything Sysadmin

How to make change when handed a $20... and help democracy

If someone owes you $5.35 and hands you a $20 bill, every reader of this blog can easily make change. You have a calculator, a cash register, or you do it in your head.

However there is a faster way that I learned when I was 12.

Today it is rare to get home delivery of a newspaper, but if you do, you probably pay by credit card directly to the newspaper company. It wasn't always like that. When I was 12 years old I delivered newspapers for The Daily Record. Back then payments were collected by visiting each house every other week. While I did eventually switch to leaving envelopes for people to leave payments for me, there was a year or so where I visited each house and collected payment directly.

Let's suppose someone owed me $5.35 and handed me a $20 bill. Doing math in real time is slow and error prone, especially if you are 12 years old and tired from lugging newspapers around.

Instead of thinking in terms of $20 minus $5.35, think in terms of equilibrium. They are handing you $20 and you need to hand back $20... the $5.35 in newspapers they've received plus the change that will total $20 and reach equilibrium.

So you basically count starting at $5.35. You say outloud, "5.35" then hand them a nickel and say "plus 5 makes 5.40". Next you hand them a dime and say "plus 10 makes 5.50". Now you can hand them 50 cents, and say "plus 50 cents makes 6". Getting from 6 to 20 is a matter of handing them 4 singles and counting out loud "7, 8, 9, and 10" as you hand them each single. Next you hand them 10 and say "and 10 makes 20".

Notice that the complexity of subtraction has been replaced by counting, which is much easier. This technique is less prone to error, and makes it easier for the customer to verify what you are doing in real time because they see what you are doing along the way. It is more transparent.

Buy a hotdog from a street vendor and you'll see them do the same thing. It may cost $3, and they'll count starting at 3 as they hand you bills, "3..., 4, 5, and 5 is 10, and 10 is 20."

I'm sure that a lot of people reading this blog are thinking, "But subtraction is so easy!" Well, it is but this is easiER and less error prone. There are plenty of things you could do the hard way and I hope you don't.

It is an important life skill to be able to do math without a calculator and this is one of the most useful tricks I know.

So why is this so important that I'm writing about it on my blog?

There are a number of memes going around right now that claim the Common Core curriculum standards in the U.S. are teaching math "wrong". They generally show a math homework assignment like 20-5.35 as being marked "wrong" because the student wrote 14.65 instead of .05+.10+.50+4+10.

What these memes aren't telling you is they are based on a misunderstanding of the Common Core requirements. The requirement is that students are to be taught both ways and that the "new way" is such that that they can do math without a calculator. It is important that, at a young age, children learn that there are multiple equivalent ways of getting the same answer in math. The multi-connectedness of mathematics is an important concept, much more important than the rote memorization of addition and multiplication tables.

If you've ever mocked the way people are being trained to "stop thinking and just press buttons on a cash register" then you should look at this "new math" as a way to turn that around. If not, what do you propose? Not teaching them to think about math in higher terms?

In the 1960s there was the "new math" movement, which was mocked extensively. However if you look at what "new math" was trying to do: it was trying to prepare students for the mathematics required for the space age where engineering and computer science would be primary occupations. I think readers of this blog should agree that is a good goal.

One of the 1960s "new math" ideas that was mocked was that it tried to teach Base 8 math in addition to normal Base 10. This was called "crazy" at the time. It wasn't crazy at all. It was recognized by educators that computers were going to be a big deal in the future (correct) and to be a software developer you needed to understand binary and octal (mostly correct) or at least have an appreciation for them (absolutely correct). History has proven they naysayers to be wrong.

When I was in 5th grade (1978-9) my teacher taught us base 8, 2 and 12. He told us this was not part of the curriculum but he felt it was important. He was basically teaching us "new math" even though it was no longer part of the curriculum. Later when I was learning about computers the concept of binary and hexadecimal didn't phase me because I had already been exposed to other bases. While other computer science students were struggling, I had an advantage because I had been exposed to these strange base systems.

One of these anti-Common Core memes includes note from a father who claims he has a Bachelor of Science Degree in Electronics Engineering which included an extensive study of differential equations and even he is unable to explain the Common Core. Well, he must be a terrible engineer since the question was not about doing the math, but to find the off-by-one error in the diagram. To quote someone on G+, "The supposed engineer must suck at his work if he can't follow the process, debug each step, and find the off-by-one error."

Beyond the educational value or non-value of Common Core, what really burns my butt is the fact that all these memes come from one of 3 sources:

  • Organizations that criticize anything related to public education while at the same time they criticize any attempt to improve it. You can't have it both ways.
  • Organizations who just criticise anything Obama is for, to the extent that if Obama changes his mind they flip and reverse their position too.
  • Organizations backed by companies that either benefit from ignorance, or profit from the privatization of education. This is blatant and cynical.

Respected computer scientist, security guru, and social commentator Gene "Spaf" Spafford recently blogged "There is an undeniable, politically-supported growth of denial -- and even hatred -- of learning, facts, and the educated. Greed (and, most likely, fear of minorities) feeds demagoguery. Demagoguery can lead to harmful policies and thereafter to mob actions."

These math memes are part of that problem.

A democracy only works if the populace is educated. Education makes democracy work. Ignorance robs us of freedom because it permits us to be controlled by fear. Education gives us economic opportunities and jobs, which permit us to maintain our freedom to move up in social strata. Ignorance robs people of the freedom to have economic mobility. The best way we can show our love for our fellow citizens, and all people, is to ensure that everyone receives the education they need to do well today and in the future. However it is not just about love. There is nothing more greedy you can do than to make sure everyone is highly educated because it grows the economy and protects your own freedom too.

Sadly, Snopes and skeptics.stackexchange.com can only do so much. Fundamentally we need much bigger solution.

October 24, 2014 03:00 PM

Standalone Sysadmin

Accidental DoS during an intentional DoS

Funny, I remember always liking DOS as a kid...

Anyway, on Tuesday, I took a day off, but ended up getting a call at home from my boss at 4:30pm or so. We were apparently causing a DoS attack, he said, and the upstream university had disabled our net connection. He was trying to conference in the central network (ITS) admins so we could figure out what was going on.

I sat down at my computer and was able to connect to my desktop at work, so the entire network wasn't shut down. It looked like what they had done was actually turn off out-bound DNS, which made me suspect that one of the machines on our network was performing a DOS as a kid...

Anyway, on Tuesday, I took a day off, but ended up getting a call at home from my boss at 4:30pm or so. We were apparently causing a DoS attack, he said, and the upstream university had disabled our net connection. He was trying to conference in the central network (ITS) admins so we could figure out what was going on.

I sat down at my computer and was able to connect to my desktop at work, so the entire network wasn't shut down. It looked like what they had done was actually turn off out-bound DNS, which made me suspect that one of the machines on our network was performing a DOS as a kid...

Anyway, on Tuesday, I took a day off, but ended up getting a call at home from my boss at 4:30pm or so. We were apparently causing a DoS attack, he said, and the upstream university had disabled our net connection. He was trying to conference in the central network (ITS) admins so we could figure out what was going on.

I sat down at my computer and was able to connect to my desktop at work, so the entire network wasn't shut down. It looked like what they had done was actually turn off out-bound DNS, which made me suspect that one of the machines on our network was performing a DOS as a kid...

Anyway, on Tuesday, I took a day off, but ended up getting a call at home from my boss at 4:30pm or so. We were apparently causing a DoS attack, he said, and the upstream university had disabled our net connection. He was trying to conference in the central network (ITS) admins so we could figure out what was going on.

I sat down at my computer and was able to connect to my desktop at work, so the entire network wasn't shut down. It looked like what they had done was actually turn off out-bound DNS, which made me suspect that one of the machines on our network was performing a DNS reflection attack, but this was just a sign of my not thinking straight. If that had been the case, they would have shut down inbound DNS rather than outbound.

After talking with them, they saw that something on our network had been initiating a denial of service attack on DNS servers using hundreds of spoofed source IPs. Looking at graphite for that time, I suspect you'll agree when I say, "yep":

Initially, the malware was spoofing IPs from all kinds of IP ranges, not just things in our block. As it turns out, I didn't have the sanity check on my egress ACLs on my gateway that said, "nothing leaves that isn't in our IP block", which is my bad. As soon as I added that, a lot of the traffic died. Unfortunately, because the university uses private IP space in the 10.x.x.x range, I couldn't block that outbound. And, of course, the malware quickly caught up to speed and started exclusively using 10.x addresses to spoof from. So we got shut down again.

Over the course of a day, here's what the graph looked like:

Now, on the other side of the coin, I'm sure you're screaming "SHUT DOWN THE STUPID MACHINE DOING THIS", because I was too. The problem was that I couldn't find it. Mostly because of my own ineptitude, as we'll see.

Alright, it's clear from the graph above that there were some significant bits being thrown around. That should be easy to track. So, lets fire up graphite and figure out what's up.

Most of my really useful graphs are thanks to the ironically named Unhelpful Graphite Tip #6, where Jason Dixon describes the "mostDeviant" function, which is pure awesome. The idea is that, if you have a BUNCH of metrics, you probably can't see much useful information because there are so many lines. So instead, you probably want the few weirdest metrics out of that collection, and that's what you get. Here's how it works.

In the graphite box, set the time frame that you're looking for:

Then add the graph data that you're looking for. Wildcards are super-useful here. Since the uplink graph above is a lot of traffic going out of the switch (tx), I'm going to be looking for a lot of data coming into the switch (rx). The metric that I'll use is:


CCIS.systems.linux.Core*.snmp.if_octets-Ethernet*.rx

That metric, by itself, looks like this:

There's CLEARLY a lot going on there. So we'll apply the mostDeviant filter:

and we'll select the top 4 metrics. At this point, the metric line looks like this:


mostDeviant(4,CCIS.systems.linux.Core*.snmp.if_octets-Ethernet*.rx)

and the graph is much more manageable:

Plus, most usefully, now I have port numbers to investigate. Back to the hunt!

As it turns out, those two ports are running to...another switch. An old switch that isn't being used by more than a couple dozen hosts. It's destined for the scrap heap, and because of that, when I was setting up collectd to monitor the switches using the snmp plugin, I neglected to add this switch. You know, because I'm an idiot.

So, I quickly modified the collectd config and pushed the change up to the puppet server, then refreshed the puppet agent on the host that does snmp monitoring and started collecting metrics. Except that, at the moment, the attack had stopped...so it was a waiting game that might never actually happen again. As luck would have it, the attack started again, and I was able to trace it to a port:

Gotcha!

(notice how we actually WERE under attack when I started collecting metrics? It was just so tiny compared to the full on attack that we thought it might have been normal baseline behavior. Oops)

So, checking that port led to...a VM host. And again, I encountered a road block.

I've been having an issue with some of my VMware ESXi boxes where they will encounter occasional extreme disk latency and fall out of the cluster. There are a couple of knowledgebase articles ([1] [2]) that sort-of kind-of match the issue, but not entirely. In any event, I haven't ironed it out. The VMs are fine during the disconnected phase, and the fix is to restart the management agents through the console, which I was able to do and then I could manage the host again.

Once I could get a look, I could see that there wasn't a lot on that machine - around half a dozen VMs. Unfortunately, because the host had been disconnected from the vCenter Server, stats weren't being collected on the VMs, so we had to wait a little bit to figure out which one it was. But we finally did.

In the end, the culprit was a NetApp Balance appliance. There's even a knowledge base article on it being vulnerable to ShellShock. Oops. And why was that machine even available to the internet at large? Double oops.

I've snapshotted that machine and paused it. We'll probably have some of the infosec researchers do forensics on it, if they're interested, but that particular host wasn't even being used. VM cruft is real, folks.

Now, back to the actual problem...

The network uplink to the central network happens over a pair of 10Gb/s fiber links. According to the graph, you can see that the VM was pushing 100MB (800Mb/s). This is clearly Bad(tm), but it's not world-ending bad for the network, right? Right. Except...

Upstream of us, we are going through an in-line firewall (that, like OUR equipment, was not set to filter egress traffic based on spoofed source IPs - oops, but not for me, finally!). We are assigned to one of five virtual firewalls on that one physical piece of hardware...despite that, the actual physical piece of hardware has a limit of around a couple hundred thousand concurrent sessions.

For a network this size, that is probably(?) reasonable, but a session counts as a stream of packets between a source IP and a destination IP. Every time you change the source IP, you get a new session, and when you spoof thousands of source IPs...guess what? And since it's a per-physical-device limit, our one rogue VM managed to take out the resources of the big giant firewall.

In essence, this one intentional DoS attack on a couple of hosts in China successfully DoS'd our university as sheer collateral damage. Oops.

So, we're working on ways to fix things. A relatively simple step is to prevent egress traffic from IPs that aren't our own. This is done now. We've also been told that we need to block egress DNS traffic, except from known hosts or to public DNS servers. This is in place, but I really question its efficacy. So we're blocking DNS. There are a lot of other protocols that use UDP, too. NTP reflection attacks are a thing. Anyway, we're now blocking egress DNS and I've had to special-case a half-dozen research projects, but that's fine by me.

In terms of things that will make an actual difference, we're going to be re-evaluating the policies in place for putting VMs on publicly-accessible networks, and I think it's likely that there will need to be justification for providing external access to new resources, whereas in the past, it's just been the default to leave things open because we're a college, and that's what we do, I guess. I've never been a fan of that, from a security perspective, so I'm glad it's likely to change now.

So anyway, that's how my week has been. Fortunately, it's Friday, and my sign is up to "It has been [3] Days without a Network Apocalypse".

by Matt Simmons at October 24, 2014 09:26 AM

Chris Siebenmann

In Go I've given up and I'm now using standard packages

In my Go programming, I've come around to an attitude that I'll summarize as 'there's no point in fighting city hall'. What this means is that I'm now consciously using standard packages that I don't particularly like just because they are the standard packages.

I'm on record as disliking the standard flag package, for example, and while I still believe in my reasons for this I've decided that it's simply not worth going out of my way over it. The flag package works and it's there. Similarly, I don't think that the log package is necessarily a great solution for emitting messages from Unix style command line utilities but in my latest Go program I used it anyways. It was there and it wasn't worth the effort to code warn() and die() functions and so on.

Besides, using flag and log is standard Go practice so it's going to be both familiar to and expected by anyone who might look at my code someday. There's a definite social benefit to doing things the standard way for anything that I put out in public, much like most everyone uses gofmt on their code.

In theory I could find and use some alternate getopt package (these days the go to place to find one would be godoc.org). In practice I find using external packages too much of a hassle unless I really need them. This is an odd thing to say about Go, considering that it makes them so easy and accessible, but depending on external packages comes with a whole set of hassles and concerns right now. I've seen a bit too much breakage to want that headache without a good reason.

(This may not be a rational view for Go programming, given that Go deliberately makes using people's packages so easy. Perhaps I should throw myself into using lots of packages just to get acclimatized to it. And in practice I suspect most packages don't break or vanish.)

PS: note that this is different from the people who say you should eg use the testing package for your testing because you don't really need anything more than what it provides and stick with the standard library's HTTP stuff rather than getting a framework. As mentioned, I still think that flag is not the right answer; it's just not wrong enough to be worth fighting city hall over.

Sidebar: Doing standard Unix error and warning messages with log

Here's what I do:

log.SetPrefix("<progname>: ")
log.SetFlags(0)

If I was doing this better I would derive the program name from os.Args[0] instead of hard-coding it, but if I did that I'd have to worry about various special cases and no, I'm being lazy here.

by cks at October 24, 2014 05:16 AM

RISKS Digest