Planet SysAdmin


April 27, 2015

Chris Siebenmann

The fading out of tcpwrappers and its idea

Once upon a time, tcpwrappers were a big thing in (Unix) host security. Plenty of programs supported the original TCP Wrapper library by Wietse Venema, and people wrote their own takes on the idea. But nowadays, tcpwrappers is clearly on the way out. It doesn't seem to be used very much any more in practice, fewer and fewer programs support it at all, and of the remaining ones that (still) do, some of them are removing support for it. This isn't exclusive to Wietse Venema's original version; the whole idea and approach just doesn't seem to be all that popular any more. So what happened?

I don't know for sure, but I think the simple answer is 'firewalls and operating system level packet filtering'. The core idea of tcpwrappers is application level IP access filtering, and it dates from an era where that was your only real choice. Very few things had support for packet filtering, so you had to do this in the applications (and in general updating applications is easier than updating operating systems). These days we have robust and well developed packet filtering in kernels and in firewalls, which takes care of much of the need for tcpwrappers stuff. In many cases, maintaining packet filtering rules may be easier than maintaining tcpwrappers rules, and kernel packet filtering has the advantage that it's centralized and so universally 'supported' by programs; in fact programs don't have any choice about it.

(Kernel packet filters can't do DNS lookups the way that tcpwrappers can, but using DNS lookups for anything except logging has fallen out of favour these days. Often people don't even want to do it for logging.)

Having written some code that used libwrap, I think that another issue is that the general sort of API that Venema's tcpwrappers has is one that's fallen out of favour. Even using the library, what you get is basically a single threaded black box. This works sort of okay if you're forking for each new connection, but it doesn't expose a lot of controls or a lot of information and it's going to completely fall down if you want to do more sophisticated things (or control the DNS lookups it does). Basically Venema's tcpwrappers works best for things that you could at least conceive of running out of inetd.

(It's not impossible to create an API that offers more control, but then you wind up with something that is more complex as well. And once you get more complex, what programs want out of connection matching becomes much more program-specific; consider sshd's 'Match' stuff as contrasted with Apache's access controls.)

Another way of putting it is that in the modern world, we've come to see IP-level access control as something that should be handled outside the program entirely or that's deeply integrated with the program (or both). Neither really fits the tcpwrappers model, which is more 'sitting lightly on top of the program'.

(Certainly part of the decline of tcpwrappers is that in many environments we've moved IP access controls completely off end hosts and on to separate firewalls, for better or worse.)

by cks at April 27, 2015 07:04 AM

April 26, 2015

Chris Siebenmann

My complicated feelings on abandoning old but good code

Yesterday I wrote about some twelve year old Python code I have that's still running unmodified, and commented about both how it's a bit sad that Python 2 has changed so little that this code doesn't even emit warnings and that this is because Python has moved on to Python 3. This raises the obvious question: am I going to move portnanny (this code) to Python 3? My current answer is that I'm not planning to, because portnanny is basically end of life and abandoned.

I don't consider the program end of life because it's bad code or code that I would do differently if I was rewriting it today. It's EOL for a simpler reason, namely that I don't really have any use for what it does any more. This particular code was first written to be in front of an NNTP server for Usenet and then actually mostly used to be the SMTP frontend of a peculiar MTA. I haven't had any NNTP servers for a long time now and the MTA that portnanny sits in front of is obsolete and should really be replaced (the MTA lingers on only because it's still simpler to leave it alone). If and when portnanny or the MTA break, it probably won't be worth fixing them; instead that will be my cue to replace the whole system with a modern one that works better.

All of this makes me sad, partly because portnanny handles what used to be an interesting problem but mostly because I think that portnanny is actually some of the best Python code I've written. It's certainly the best tested Python code I've written; nothing else comes close. When I wrote it, I tried hard to do a good job in both structure and implementation and to follow the mantras of test driven development for Python, and I'll probably never again write Python code that is as well done. Turning my back on the best Python code I may ever write isn't entirely a great feeling and there's certainly part of me that doesn't want to, however silly that is.

(It's theoretically possible I'll write Python code this good in the future, but something significant would have to change in my job or my life in order to make writing high quality Python an important part of it.)

There's a part of me that now wants to move portnanny to Python 3 just because. But I've never been able to get really enthused about programming projects without a clear use and need, and this would be just such a 'no clear use' make work project. Maybe I'll do it anyways someday, but the odds are not good.

(Part of the reason that the odds are not good is that I think the world has moved on from using tcpwrappers like user level filtering for access control, especially when it's embodied in external programs. So not only do I not really have a use for portnanny any more, I don't think anyone else would even if they knew about it.)

by cks at April 26, 2015 05:51 AM

April 25, 2015

Chris Siebenmann

I'm still running a twelve year old Python program

I've been thinking for a while about the interesting and perhaps somewhat remarkable fact that I'm still running a twelve year old Python program (although not as intensely as I used to be). When I say twelve year old, I don't mean that the program was first written twelve years ago and is still running; I mean that the code hasn't been touched since 2004. For bonus points, it uses a compiled module and the source of the module hasn't been changed since 2004 either (although I have periodically rebuilt it, including moving it from 32-bit to 64-bit).

(I was going to say that the tests all still pass, but it turns out that I burned a now-obsolete IP address into one of them. Oops. Other than that, all the tests still pass.)

While the code uses old idioms (it entirely uses the two argument form of raise, for example), none of them are so old and questionable that Python 2 emits deprecation warnings for them. I'm actually a little bit surprised by that; even back in 2004 I was probably writing old fashioned code. Apparently it's still not too old fashioned.

Some of the long life of this old code can be attributed to the fact that Python 2 has been moving slowly. In 2004 I was writing against some version of Python 2.3, and the Python 2 line stopped really changing no later than Python 2.7 in 2010. Really, I doubt anyone was in a mood to deprecate very much in Python 2 after 2007 or so, and maybe earlier (I don't know when serious Python 3 development started; 3.0 was released at the end of 2008).

(With that said, Python 2.6 did deprecate some things, and there were changes in Python 2.4 and 2.5 as well.)

My impression is that this is a reasonably surprising lifespan for an unchanged set of source code, especially in an interpreted language. Even in a compiled language like C, I'd expect to have to update some function prototypes and stuff (never mind a move from 32 bits to 64 bits). While it's certainly been convenient for me in that I haven't had to pay any attention to this program and it just worked and worked even as I upgraded my system, I find myself a little bit sad that Python 2 has moved so slowly that twelve years later my code doesn't even get a single deprecation warning.

(The flip side argument is that my code would get plenty of warnings and explosions if I ran it on Python 3. In this view the language of Python as a whole has definitely moved on and I have just chosen to stay behind in a frozen pocket of oldness that was left for people like me, people who had old code they didn't want to bother touching.)

PS: It turns out that the Github repo is somewhat ahead of the state of the code I have running on my system. Apparently I did some updates when I set up the Github repo without actually updating the installed version. Such are the hazards of having any number of copies of code directories.

by cks at April 25, 2015 05:16 AM

April 24, 2015

Chris Siebenmann

A DKMS problem I had with lingering old versions

I use the DKMS-based approach for my ZFS on Linux install, fundamentally because using DKMS makes upgrading kernels painless and convenient. It's worked well for a long time, but recently some DKMS commands, particularly 'dkms status', started erroring out with the odd message:

Error! Could not locate dkms.conf file.
File:  does not exist.

Since everything seemed to still work I shrugged my shoulders and basically ignored it. I don't know DKMS myself; as far as I've been concerned, it's just as much magic as, oh, /bin/kernel-install (which, if you're not familiar with it, is what Fedora runs to set up new kernels). I did a little bit of Internet searching for the error message but turned up nothing that seemed particularly relevant. Then today I updated to a new Fedora kernel, got this message, and in an excess of caution decided to make sure that I actually had the ZoL binary modules built and installed for the new kernel. Well, guess what? I didn't. Nor could I force them to be built for the new kernel; things like 'dkms install ...' kept failing with this error message or things like it.

(I felt very happy about checking before I rebooted the system into the new kernel and had it come up without my ZFS pools.)

I will cut to the chase. ZFS on Linux recently released version 0.6.4, when I had previously been running development versions that still called themselves 0.6.3 for DKMS purposes. When I upgraded to 0.6.4, something in the whole process left behind some 0.6.3 directory hierarchies in a DKMS area, specifically /var/lib/dkms/spl/0.6.3 and /var/lib/dkms/zfs/0.6.3. Removing these lingering directory trees made DKMS happy with life and allowed me to eventually build and install the 0.6.4 SPL and ZFS modules for the new kernel.

(The dkms.conf file(s) that DKMS was looking for are normally found in /usr/src/<pkg>-<ver>. My theory is that the lingering directories in /var/lib/dkms were fooling DKMS into thinking that spl and zfs 0.6.3 were installed, and then it couldn't find their dkms.conf files under /usr/src and errored out.)

I have no idea if this is a general DKMS issue, something that I only ran into because of various somewhat eccentric things I wound up doing on my machine, or some DKMS related thing that the ZoL packages are doing slightly wrong (which has happened before). At least I've solved it now and 'dkms status' is now happy with life.

(I can't say I've deeply increased my DKMS knowledge in the process. DKMS badly needs a 'so you're a sysadmin and something has gone wrong with a DKMS-based package, here's what you do next' document. Also, this is obviously either a bad or a buggy error message.)

by cks at April 24, 2015 05:14 AM

April 23, 2015

SysAdmin1138

I'm not a developer but...

...I'm sure spending a lot of time in code lately.

Really. Over the last five months, I'd say that 80% of my normal working hours are spent grinding on puppet code. Or training others in getting them to maybe do some puppet stuff. I've even got some continuous integration work in, building a trio of sanity-tests for our puppet infrastructure:

  • 'puppet parser validate' returns OK for all .pp files.
    • Still on the 'current' parser, we haven't gotten as far as future/puppet4 yet.
  • puppet-lint returns no errors for the modules we've cleared.
    • This required extensive style-fixes before I put it in.
  • Catalogs compile for a certain set of machines we have.
    • I'm most proud of this, as this check actually finds dependency problems unlike puppet-parser.

Completely unsurprising, the CI stuff has actually caught bugs before it got pushed to production! Whoa! One of these days I'll be able to grab some of the others and demo this stuff, but we're off-boarding a senior admin right now and the brain-dumping is not being done by me for a few weeks.

We're inching closer to getting things rigged that a passing-build in the 'master' branch triggers an automatic deployment. That will take a bit of thought about, as some deploys (such as class-name changes) require coordinated modifications in other systems.

by SysAdmin1138 at April 23, 2015 07:30 PM

The Tech Teapot

Observations on 8 Issues of C# Weekly

At the end of 2013 I thought it would be interesting to create a C# focused weekly newsletter. I registered the domain and created a website and hooked it up to the excellent MailChimp email service. The site went live (or at least Google Analytics was added to the site) on the 18th December 2013.

And then I kind of forgot about it for a year or so.

In the mean time the website was getting subscribers. Not many but enough to suggest that there’s an appetite for a weekly C# focused newsletter.

The original website was a simple affair, just a standard landing page written using Bootstrap. Simple it may have been, but rather effective.

I figured that I might as well give the newsletter a go. Whilst the audience wasn’t very big (it still isn’t), it was big enough to give some idea whether the newsletter was working or not.

I’ve now curated eight issues, I thought it about time to take stock. The first issue was sent on Friday 27th February 2015 and has been sent every Friday since.

C# Weekly Subscriber Chart

C# Weekly Subscribers

The number of subscribers has grown over the last eight weeks, starting from a base of 194 subscribers there are now 231. A net increase of 37 in eight weeks. Of course there have been unsubscribes, but the new subscribers each week have always outnumbered them.

C# Weekly Website Visitors Chart

C# Weekly Website Visitors

The website traffic has also trended upwards over the last year, especially since the first issue was curated. Now there are a number of pages on the website, the site is receiving more traffic from the search engines. The number of visitors is not high, but at least progress is being made.

I started a Google Adwords campaign for the newsletter fairly early on. The campaign was reasonably successful while it lasted. Unfortunately, the website must have tripped up some kind of filter because the campaign was stopped. Pitty because the campaign did convert quite well. I did appeal the decision and I was informed that Google didn’t like the fact that the site doesn’t have a business model and that it links out to other websites a lot. Curiously, both charges could have been levelled at Google in the early days.

The weekly newsletter itself is a lot of fun to produce. During the week I look out for interesting and educational C# related content and tweet it via the @C# Weekly twitter account. I then condense all of the twitter content into a single email once a week on a Friday. Not all of the tweeted content makes it into the weekly issue. There is often overlap between content produced during the week so I get to choose what’s best.

The job of curating does take longer than I originally anticipated. I suspect that each issue takes the better part of a day to produce, which is probably twice the time I anticipated. Perhaps, when I’m more experienced, I will be able to reduce the time taken to produce the issue without reducing the quality.

Time will tell.

I can certainly recommend curating a newsletter from the perspective of learning about the topic. Even after only eight weeks I feel like I have learned quite a lot about C# that I wouldn’t otherwise have known, especially about the upcoming C# version 6.

If you want to learn about a topic, I can certainly recommend curating a newsletter as a good learning tool.

One of my better ideas over the last year was to move over to Curated, a newsletter service provider founded by the folks who run the very successful iOS Dev Weekly newsletter. The tools provided by Curated have helped a lot.

C# Weekly Content Interaction Stats Chart

C# Weekly Content Interaction Statistics

One of the best features of Curated is the statistics it provides for each issue. You can learn a lot from discovering which content subscribers are most interested in. You won’t be surprised to find that the low level C# content is the most popular.

The only minor problem I’ve found is that my dog eared old site actually converted subscribers at a higher rate than my current site. The version 1 site was converting at around 18% whereas the current site is converting at around 13%.

I have a few ideas around why the conversion rate has dropped. The current site is a bit drab and needs to be spruced up a little bit. In addition, the current site displays previous issues, including the most recent issue on the home page. I wonder if having content on the home page actually distracts people from subscribing.

All told curating a newsletter is fun. I can thoroughly recommend it. :)

by Jack Hughes at April 23, 2015 03:10 PM

Chris Siebenmann

Upgrading machines versus reinstalling them

Yesterday I mentioned that we would be 'upgrading' the version of OmniOS on our fileservers not by using the OmniOS upgrade process but by reinstalling them. While this was partly forced by an OmniOS problem, it's actually our approach in general. We tend to take this for two reasons.

The first reason is that it leads to either simpler install instructions or more identical machines if you have to rebuild one, depending on how you approach rebuilding upgraded machines. If you upgraded a machine from OS version A to OS version B, in theory you should reinstall a replacement by going through the same process instead of directly installing OS version B. If you directly install OS version B, you have a simpler and faster install process but you almost never get an exactly identical machine.

(In fact until you actually do this as a test you can't be sure you even wind up with a fully functional replacement machine. It's always possible that there's something vital in your current build instructions that only gets set up right if you start from OS version A and then upgrade.)

The second reason is that customizations done on OS version A are not always still applicable or necessary on OS version B. Sometimes they've even become counterproductive. If you're upgrading, you have to figure out how to find these issues and then how to fix them up. If you're directly (re)installing OS version B, you get a chance to start from scratch and apply only what you need (in the form you now need it in) on OS version B, and you don't have to deal with juggling all sorts of things during the transition from version A to version B.

(Relatedly, you may have changed your mind or simply learned better since your install of OS version A. Doing a from-scratch reinstall is a great opportunity to update to what you feel is the current best practice for something.)

Mind you, there are machines and situations where in-place upgrades are less disruptive and easier to do than complete reinstalls. One of them is when the machine has complex local state that is hard to fence off or back up and restore; another is if a machine was heavily customized, especially in ad-hoc ways. And in-place upgrades can involve less downtime (especially if you don't have surplus machines to do complex juggling). This is a lot of why I do in-place live upgrades of my workstations.

by cks at April 23, 2015 05:26 AM

April 22, 2015

Everything Sysadmin

A tweet about Git

Best Tweet I've seen in months: That just about sums it up.

April 22, 2015 05:00 PM

Chris Siebenmann

Don't make /opt a filesystem on OmniOS (or probably Illumos generally)

OmniOS boot environments are in general pretty cool things, but they do create one potential worry: how much data gets captured in them and thus how much space they can consume over time. Since boot environments are ultimately ZFS snapshots and clones, the amount of space each individual one uses over time is partly a function of how much of the data captured changes over time. Taking an extreme case, if you have a very large /var/log that is full of churning logs for some reason, each boot environment and BE snapshot you have will probably wind up with its own unique copy of all of this data.

(Unchanging data is free since it's shared between all BEs and BE snapshots.)

One of the things that this can push you towards is limiting what's included in your boot environments by making some things into separate filesystems. One obvious candidate here is /opt, where you may wind up with any number of local and semi-local packages that you update and otherwise churn at a much faster rate than base OmniOS updates and upgrades. After all this is the entire point of the OmniOS KYSTY principle and the historical use of /opt.

Well, I'll cut to the chase: don't do this if you want to be able to do upgrades between OmniOS releases, at least right now. You can create separate ZFS filesystems under bits of /opt, but if you take the obvious route of making all of /opt its own ZFS filesystem things will go terribly wrong. The problem is that some core OmniOS packages that you may wind up installing (such as their GCC packages) are put into /opt but upgraded as part of making a new boot environment on major upgrades. Because a boot environment only contains things directly in /, this doesn't work too well; pkg tries to update things that aren't actually there and will either fail outright or create things in /opt in the root of your new BE, which blocks mounting the real /opt.

I will summarize the resulting OmniOS mailing list discussion as 'a separate /opt is not a supported configuration; don't do that'. At the best pkg may some day report an explicit error, so that if you're stuck in this situation you'll know and you can temporarily remove all of those OmniOS packages in /opt.

(Our solution is to abandon plans to upgrade machines from r151010 to r151014. We'll reinstall from scratch and this time we'll make the largest single piece of /opt into a filesystem instead.)

My personal view is that this means that you do not want to build or install anything in /opt. Make up your own hierarchy, maybe /local, and use that instead; that should always be safe to make into its own filesystem. OmniOS effectively owns /opt and so you should stay out.

I believe that this is a general issue with all Illumos derived distributions if they put any of their own packages in /opt, such as GCC. I Have not looked at anything other than OmniOS. I don't know if it's an issue on Solaris 11; I'd like to hope not, but then I have low confidence in Oracle getting this right either.

(You may think that being concerned about disk space is so 00s, in this day of massively large hard drives. Well, inexpensive SSDs are not yet massively large and they're what we like to use as root drives these days. They're especially not large in the face of crash dumps, where OmniOS already wants a bunch of space.)

by cks at April 22, 2015 05:04 AM

April 21, 2015

SysAdmin1138

Why 'ASAP' is a craptastic deadline

Because I get to define what's 'possible', and anything is possible given enough time, management backing, and an unlimited budget.

If I don't have management backing, I will decide on my own how to fit this new ASAP in amongst my other ASAP work and the work that has actual deadlines attached to it.

If this ASAP has a time/money tradeoff, I need management backing to tell me which way to go. And what other work to let sluff in order to get the time needed.


In the end, there are only a few priority levels that people actually use.

  1. Realtime. I will stand here until I get what I need.
  2. ASAP.
  3. On this defined date or condition.
  4. Whenever you can get to it.

Realtime is a form of ASAP, but it's the kind of ASAP where the requester is highly invested in it and will keep statusing and may throw resources at it in order to get the thingy as soon as actually possible. Think major production outages.

ASAP is really 'as soon as you can get to it, unless I think that's not fast enough.' For sysadmin teams where the load-average is below the number of processors this can work pretty well. For loaded sysadmin teams, the results will not be to the liking of the open-ended deadline requestors.

On this defined date or condition is awesome, as it gives us expectations of delivery and allows us to do queue optimization.

Whenever you can get to it is like nicing a process. It'll be a while, but it'll be gotten to. Eventually.

"ASAP, but no later than [date]" is a much better way of putting it. It gives a hint to the queue optimizer as to where to slot the work amongst everything else.

Thank you.

by SysAdmin1138 at April 21, 2015 03:41 PM

Chris Siebenmann

I don't think I'm interested in containers

Containers are all the rage in system administration right now, and I can certainly see the appeal. So it feels more than a bit heretical to admit that I'm not interested in them, ultimately because I don't think they're an easy fit for our environment.

What it comes down to is two things. The first is that I think containers really work best in a situation where the 'cattle' model of servers is a good fit. By contrast, our important machines are not cattle. With a few exceptions we have only one of each machine today, so in a container world we would just be turning those singular machines into singular containers. While there are some wins for containers I'm not convinced they're very big ones and there are certainly added complexities.

The second is that we are pretty big on using different physical machines to get fault independence. As far as we're concerned it's a feature that if physical machine X dies for whatever reason, we only lose a single service. We co-locate services only infrequently and reluctantly. This obviously eliminates one of the advantages of containers, which is that you can run multiple containers on a single piece of hardware. A world where we run a base OS plus a single container on most servers is kind of a more complicated world than we have now and it's not clear what it gets us.

I can sort of imagine a world where we become a container based environment (even with our present split of services) and I can see some advantages to it. But it's clear that it would take a lot of work to completely redo everything in our environment as a substrate of base OS servers and then a strata of ready to go containers deployed on top of them, and while we'd get some things out of such a switch I'm not convinced we'd get a lot.

(Such a switch would be more like a green field rebuild from total scratch; we'd probably want to throw away everything that we do now. This is just not feasible for us for various reasons, budget included.)

So the upshot of all of this is that while I think containers are interesting as a technical thing and I vaguely keep track of the whole area, I'm not actually interested in them and I have no plans to explore them, try them out, and so on. I feel oddly embarrassed by this for reasons beyond the comfortable scope of this entry, but there it is whether I like it or not.

(I was much more optimistic a few years ago, but back then I was just theorizing. Ever since then I've failed to find a problem around here where I thought 'yes, containers will make my life simpler here and I should advocate for them'. Even my one temptation due to annoyance was only a brief flirtation before sense set in.)

by cks at April 21, 2015 03:58 AM

LZone - Sysadmin

Python re.sub Examples

Example for re.sub() usage in Python

Syntax

import re


result = re.sub(pattern, repl, string, count=0, flags=0);

Simple Examples

num = re.sub(r'abc', '', input)              # Delete pattern abc
num = re.sub(r'abc', 'def', input)           # Replace pattern abc -> def
num = re.sub(r'\s+', '\s', input)            # Eliminate duplicate whitespaces
num = re.sub(r'abc(def)ghi', '\1', input)    # Replace a string with a part of itself

Advance Usage

Replacement Function

Instead of a replacement string you can provide a function performing dynamic replacements based on the match string like this:
def my_replace(m):
    if :
       return <replacement variant 1>
    return <replacement variant 2>


result = re.sub("\w+", my_replace, input)

Count Replacements

When you want to know how many replacements did happen use re.subn() instead:
result = re.sub(pattern, replacement, input)
print ('Result: ', result[0])
print ('Replacements: ', result[1])

April 21, 2015 12:16 AM

April 20, 2015

Slaptijack

Why Isn't tmpreaper Working?

If you have a directory that you want to keep clean, tmpreaper is a great way to remove files based on how old they are. The other day, I had a directory that looked like this:

x@x:~/dump$ ls -l
-rw-r--r-- 1 x x 212268169 Mar 15 01:02 x-2015-03-15.sql.gz
-rw-r--r-- 1 x x 212270156 Mar 16 01:02 x-2015-03-16.sql.gz
-rw-r--r-- 1 x x 212276275 Mar 17 01:02 x-2015-03-17.sql.gz
-rw-r--r-- 1 x x 212308369 Mar 18 01:02 x-2015-03-18.sql.gz
-rw-r--r-- 1 x x 212315343 Mar 19 01:02 x-2015-03-19.sql.gz
-rw-r--r-- 1 x x 212324575 Mar 20 01:02 x-2015-03-20.sql.gz
-rw-r--r-- 1 x x 212341738 Mar 21 01:02 x-2015-03-21.sql.gz
-rw-r--r-- 1 x x 212375590 Mar 22 01:02 x-2015-03-22.sql.gz
-rw-r--r-- 1 x x 212392563 Mar 23 01:02 x-2015-03-23.sql.gz

I decided that I didn't need those SQL dumps older than 30 days, so I would use tmpreaper to clean them. The proper command is

tmpreaper +30d /home/x/dump

I added the --test flag just to make sure my command would work. But, alas! Nothing happened:

x@x:~/dump$ tmpreaper --test +30d /home/x/dump
(PID 4057) Pretending to clean up directory `/home/x/dump'.

That's when I realized that this is the directory that I had recently recompressed all the files using gzip -9 to get better compression. Although ls was reporting the time the original files were created, this is not the time tmpreaper was looking at. You can see what I mean if you use ls -lc:

x@x:~/dump$ ls -lc
total 7863644
-rw-r--r-- 1 x x 212268169 Apr 11 03:03 x-2015-03-15.sql.gz
-rw-r--r-- 1 x x 212270156 Apr 11 03:06 x-2015-03-16.sql.gz
-rw-r--r-- 1 x x 212276275 Apr 11 03:09 x-2015-03-17.sql.gz
-rw-r--r-- 1 x x 212308369 Apr 11 03:12 x-2015-03-18.sql.gz
-rw-r--r-- 1 x x 212315343 Apr 11 03:15 x-2015-03-19.sql.gz
-rw-r--r-- 1 x x 212324575 Apr 11 03:17 x-2015-03-20.sql.gz
-rw-r--r-- 1 x x 212341738 Apr 11 03:20 x-2015-03-21.sql.gz
-rw-r--r-- 1 x x 212375590 Apr 11 03:23 x-2015-03-22.sql.gz
-rw-r--r-- 1 x x 212392563 Apr 11 03:26 x-2015-03-23.sql.gz

In order to have tmpreaper look at the proper timestamp, use the --mtime flag:

x@x:~/dump$ tmpreaper --mtime --test +30d /home/x/dump
(PID 4362) Pretending to clean up directory `/home/x/dump'.
Pretending to remove file `/home/x/dump/x-2015-03-20.sql.gz'.
Pretending to remove file `/home/x/dump/x-2015-03-17.sql.gz'.
Pretending to remove file `/home/x/dump/x-2015-03-21.sql.gz'.
Pretending to remove file `/home/x/dump/x-2015-03-15.sql.gz'.
Pretending to remove file `/home/x/dump/x-2015-03-22.sql.gz'.
Pretending to remove file `/home/x/dump/x-2015-03-18.sql.gz'.
Pretending to remove file `/home/x/dump/x-2015-03-19.sql.gz'.
Pretending to remove file `/home/x/dump/x-2015-03-16.sql.gz'.

by Scott Hebert at April 20, 2015 04:12 PM

Standalone Sysadmin

Reminder (to self, too): Use Python virtualenv!

I’m really not much of a programmer, but I dabble at times, in order to make tools for myself and my colleagues. Or toys, like the time I wrote an entire MBTA library because I wanted to build a Slack integration for the local train service.

One of the things that I want to learn better, because it seems so gosh-darned helpful, is Python. I’m completely fluent (though non-expert level) in both Bash and PHP, so I’m decent at writing systems scripts and web back-ends, but I’m only passingly familiar with Perl. The way I see it, the two “modern” languages that get the most use in systems scripts are Python and Ruby, and it’s basically a toss-up for me as to which to pick.

Python seems a little more pervasive, although ruby has the benefit of basically being the language of our systems stack. Puppet, Foreman, logstash, and several other tools are all written in Ruby, and there’s a lot to be said for being fluent in the language of your tools. That being said, I’m going to learn Python because it seems easier and honestly, flying sounds so cool.

 

One of the things that a lot of intro-to-Python tutorials don’t give you is the concept of virtual environments. These are actually pretty important in a lot of ways. You don’t absolutely need them, but you’re going to make your life a lot better if you use them. There’s a really great bit on why you should use them on the Python Guide, but basically, they create an entire custom python environment for your code, segregated away from the rest of the OS. You can use a specific version of python, a specific set of modules, and so on (with no need for root access, since they’re being installed locally).

Installing virtualenv is pretty easy. You may be able to install it with your system’s package manager, or you may need to use pip. Or you could use easy_install. Python, all by itself, has several package managers. Because of course it does.

Setting up a virtual environment is straight forward, if a little kudgy-feeling. If you find that you’re going to be moving it around, maybe from machine to machine or whatever, you should probably know about the —relocatable flag.

By default, the workflow is basically, create a virtual environment, “activate” the virtual environment (which mangles lots of environmental variables and paths, so that python-specific stuff runs local to the environment rather than across the entire server), configuring it by installing the modules you need, write/execute your code as normal, and then deactivate your environment when you’re done, which restores all of your original environmental settings.

There is also a piece of software called virtualenvwrapper that is supposed to make all of this easier. I haven’t used it, but it looks interesting. If you find yourself really offended by the aforementioned workflow, give it a shot and let me know what you think.

Also as a reminder, make sure to put your virtual environment directory in your .gitignore file, because you’re definitely using version control, right? (Right?) Right.

Here’s how I use virtual environments in my workflow:


msimmons@bullpup:~/tmp > mkdir mycode
msimmons@bullpup:~/tmp > cd mycode
msimmons@bullpup:~/tmp/mycode > git init
Initialized empty Git repository in /home/msimmons/tmp/mycode/.git/
msimmons@bullpup:~/tmp/mycode > virtualenv env
New python executable in env/bin/python
Installing setuptools, pip...done.
msimmons@bullpup:~/tmp/mycode > echo "env" > .gitignore
msimmons@bullpup:~/tmp/mycode > git add .gitignore # I always forget this!
msimmons@bullpup:~/tmp/mycode > source env/bin/activate
(env)msimmons@bullpup:~/tmp/mycode >
(env)msimmons@bullpup:~/tmp/mycode > which python
/home/msimmons/tmp/mycode/env/bin/python
(env)msimmons@bullpup:~/tmp/mycode > deactivate
msimmons@bullpup:~/tmp/mycode > which python
/usr/bin/python

by Matt Simmons at April 20, 2015 01:15 PM

Chris Siebenmann

An interesting trick for handling line numbers in little languages

One of the moderately annoying issues you have to deal with when writing a lexer for a language is handling line numbers. Being able to report line numbers is important for passably good error messages, but actually doing this can be a pain in the rear end.

The usual straightforward way is to have your lexer keep track of the current line number and make it available to higher levels on demand. One problem this runs into is that the lexer's current position is not necessarily where the error actually is. The simple case is languages that don't allow multi-line constructs, but even here you can wind up off by a line in some situations.

A more sophisticated approach is to include the line number (and perhaps the position in the line) as part of what you return for every token. Both the parser and the lexer can then use this to report accurate positions for everything without any problems, although the lexer still has to keep counting lines and so on.

Somewhat recently I wound up writing a lexer in Go as part of a project, and I modeled it after Rob Pike's presentation on lexing in Go. Pike's lexer uses an interesting low-rent trick for handling line numbers, although it's one that's only suitable for use with a little language. Pike's lexer is given the entire source code to lex at once, so rather than explicitly tracking line numbers it just tracks the absolute character position in the source code (which it needs anyways) and includes this absolute character position as part of the tokens. If you turn out to need the line number, you call back to the lexer with the character position and the lexer counts how many newlines there are between the start of the source and the position.

Ever since I saw it this has struck me as a really clever approach if you can get away with it. Not only is it really easy to implement, but it's optimized for the common case of not needing the line number at all because you're parsing something without errors. Now that I've run into it, I'll probably reuse it in all future little language lexers.

Note that this isn't a general approach for several reasons. First, serious lexers are generally stream lexers that don't first read all of the source code into memory. Second, many languages routinely want line number information for things like profiling, debugging, and exception traces (and all of these uses are well after lexing has finished). That's why I say Pike's approach here is best for little languages, where it's common to read all of the source in at once for simplicity and you generally don't have those issues.

(If I was dealing with a 'bigger' language, I think that today I would take the approach of returning the line number as part of every token. It bulks up the returned token a bit but having the line number information directly in the token makes your life simpler in the long run, as I found out from the Go parser I wrote.)

by cks at April 20, 2015 04:42 AM

April 19, 2015

Chris Siebenmann

A potential path to IPv6 (again), but probably not a realistic one today

In practice, adding IPv6 to existing networks is a lot of work and is clearly going quite slowly in many places, or even not going at all. Given the economic incentives involved, this is no surprise; currently IPv6 primarily benefits people who are not on the Internet, not people who are. So what will drive adoption of IPv6, so that it becomes available in more areas? In particular, what would push http://www.cs.toronto.edu/ towards adding IPv6 to our networks?

My current answer is that the only thing that would really make this important is a noticeable amount of IPv6 only websites and other Internet resources that people here wanted to reach. If this happens and especially if it's increasing, that would create an actual win for our users for us deploying IPv6 instead of the current situation of it just being kind of nice. But where are these IPv6 only resources going to come from?

My best guess is that the most likely place to develop them are areas with large IPv6 penetration today. If you're building a business that is primarily or entirely targeting an IPv6 enabled audience (if, for example, you're targeting mobile users in a geographic area where they all get IPv6), only going with IPv6 for your servers and so on may make your life simpler.

Unfortunately there are a lot of holes in this idea. Even if you're dealing with an area where IPv6 is better than IPv4, running a dual stack environment is probably easy enough that it's cheap insurance against needing to expand into an IPv4 audience (and it means that all sorts of IPv4 only people can at least check you out). Going dual stack does increase IPv6 usage on the whole, but it doesn't turn you into an engine driving IPv6 adoption elsewhere. Beyond that, the Wikipedia page on IPv6 deployment and APNIC's numbers suggests that I've significantly overestimated how many areas of the world are strongly IPv6 enabled at the moment. If there's no real pool of IPv6 users (especially in areas that are not already saturated with IPv4 address space), well, so much for that.

All of this does make me wonder if and when large hosting and datacenter providers will start effectively charging extra for IPv4 addresses (either explicitly or by just giving you a discount if you only want IPv6 ones). That would be both a driver and a marker of a shift to IPv6.

(I wrote about a potential path to IPv6 a while back. This is kind of a new version of that idea from a different perspective, although I had forgotten my old entry when I first had this idea.)

by cks at April 19, 2015 06:11 AM

April 18, 2015

The Lone Sysadmin

When Should I Upgrade to VMware vSphere 6?

I’ve been asked a few times about when I’m planning to upgrade to VMware vSphere 6. Truth is, I don’t know. A Magic 8 Ball would say “reply hazy, try again.” Some people say that you should wait until the first major update, like the first update pack or first service pack. I’ve always thought […]

The post When Should I Upgrade to VMware vSphere 6? appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at April 18, 2015 05:38 AM

Chris Siebenmann

A core problem of IPv6 adoption is the lack of user benefits

I've written before about some of the economic incentives involved with IPv6 adoption, focusing on who benefits from IPv6. Today I want to touch on this economic issue from another angle. Put simply, one of the big problems is this:

In many places, adding IPv6 to your network won't improve anything for your users.

Sure, from a geeky technical side it's nice to support IPv6 on your network and see ipv6.google.com and so on. Having your network and organization be IPv6 ready and enabled is clearly the right thing, a good thing for the future, and all that. But it's not essential. In fact it's usually not even beneficial, not even a little bit. If you add IPv6 to your network today, generally almost no one will notice anything different.

(Let's pretend that there are no bugs and systems that are unprepared to deal with IPv6 addresses and so on.)

At one level this is great; it's good that you can quietly drop in another network protocol and no one notices. At another level it's catastrophic to IPv6 adoption. IPv6 adoption is a lot of work in most networks; you've got a great deal to learn, a great deal to set up, a great deal to test, and so on. Unless you have a lot of free time it's hard to justify spending a lot of effort on something that doesn't actually deliver real improvements to your users, it's just the right thing to do.

(People like working on right things, but they inevitably get a low priority and thus not very much time. They're sleepy Friday and slack day and 20% time projects, not prime time work.)

Purely from a speed of adoption perspective, it would be much better if adding IPv6 was less transparent because it suddenly let people do things that they couldn't do before. Then you'd have a much easier time of building a case for spending significant effort on it.

(In fact it's my impression that many of the IPv6 adoption stories I've heard about are exactly from situations where adopting IPv6 did deliver real, tangible benefits to the organization involved. See eg Facebook's slides about their internal IPv6 usage, where IPv6 helped them deal with real issues and made their lives better.)

by cks at April 18, 2015 05:03 AM

Steve Kemp's Blog

skx-www upgraded to jessie

Today I upgraded my main web-host to the Jessie release of Debian GNU/Linux.

I performed the upgraded by changing wheezy to jessie in the sources.list file, then ran:

apt-get update
apt-get dist-upgrade

For some reason this didn't upgrade my kernel, which remained the 3.2.x version. That failed to boot, due to some udev/systemd issues (lots of "waiting for job: udev /dev/vda", etc, etc). To fix this I logged into my KVM-host, chrooted into the disk image (which I mounted via the use of kpartx), and installed the 3.16.x kernel, before rebooting into that.

All my websites seemed to be OK, but I made some changes regardless. (This was mostly for "neatness", using Debian packages instead of gems, and installing the attic package rather than keeping the source-install I'd made to /opt/attic.)

The only surprise was the significant upgrade of the Net::DNS perl-module. Nothing that a few minutes work didn't fix.

Now that I've upgraded the SSL-issue I had with redirections is no longer present. So it was a worthwhile thing to do.

April 18, 2015 12:00 AM

April 17, 2015

Chris Siebenmann

In practice, programmers mostly understand complexity by superstition

A while back I said that a significant amount of programming is done by superstition. A corollary of that is that a great deal about programming is also understood primarily through superstition and mythology, as opposed to going to the actual deep technical detail and academic definitions of things. Today I want to point specifically to complexity.

You are a good, well educated programmer, so you know what 'constant time' or O(1) in the context of hash tables really means (for example). You probably know many of the effects that can distort hash table operations from being fast; there's that 'constant time' really only refers to 'relative to the number of entries', there's pathological cases that violate this (like extensive hash collisions), there's the practical importance of constant factors, there's the time taken by other necessary operations (which may not be constant time themselves), and then there's the low-level effects of real hardware (RAM fetches, TLB fills, L2 cache misses, and so on). This is an incomplete list, because everything is complex when you dig in.

Most programmers either don't know about this or don't think about it very much. If something is called 'constant time', they will generally expect it to be fast and to be consistently so under most conditions (certainly normal conditions). Similar things hold for other broad complexity classes, like linear or n log n. In fact you're probably doing well if a typical programmer can even remember what complexity class things are in. Generally this doesn't matter; as before, the important thing is the end result. If you're writing code with a significantly wrong understanding of something's practical behavior and thus time profile, you're probably going to notice.

(Not always, though; see Accidentally Quadratic, especially the rationale. You can choose to see this as a broad problem with the mismatch between superstition and actual code behavior here, in that people are throwing away performance without realizing it.)

by cks at April 17, 2015 06:03 AM

April 16, 2015

Slaptijack

Python urllib2 with gzip or deflate encoding

Python LogoThe other night I was working on some Python code that interacts with the zKillboard API. The API call returns kill information via JSON for EVE Online. While making the request, I was getting an error that I was using "non-acceptable encoding." It turns out that the zKillboard guys require you to accept gzip or deflate encoding in order to save on bandwidth. Here's a snippet showing how I added the "Accept-encoding" header to my urllib2 request:


import json
import urllib2
 
request = urllib2.Request('https://zkillboard.com/api/kills/solarSystemID/30000142/pastSeconds/86400/')
request.add_header('Accept-encoding', 'gzip,deflate')
 
data = urllib2.urlopen(request)
 
kills = json.load(data)

For all you EVE Online guys out there, the following API request will give you the kills in Jita over the last 24 hours. If you like the idea of a video game you can write code against, check out EVE Online's free trial. That link will give you 21 days free versus the usual 14.

Related Reading:

by Scott Hebert at April 16, 2015 07:21 PM

The Tech Teapot

The evils of the dashboard merry-go-round

What is the first thing you do when you get to the office? Check your email probably. Then what? I bet you go through the same routine of checking your dashboards to see what’s happened overnight.

That is exactly what I do every single morning at work.

And I keep checking those dashboards throughout the day. Sometimes I manage to get myself in a loop, continuing around and around the same set of dashboards.

I can pretty much guarantee that nothing will have changed significantly. Certainly not to the point where some action is necessary.

How many times do you check some dashboard and then action something? Me? Very rarely, if ever.

The Problem

I call this obsessive need to view all of my various dashboards my dashboard merry-go-round.

The merry-go-round is particularly a problem with real time stats. When Google Analytics implemented real time stats I guarantee that webmaster productivity plumeted the very next day.

But the merry-go-round is not confined to Google Analytics or even website statistics. Any kind of statistics will do.

It is a miracle I manage to do any work. One of the most seductive things about the constant dashboard merry-go-round is that it feels like work. There you are frantically pounding away quickly moving between your various dashboards. It even looks to everybody else like work. Your boss probably thinks you’re being really productive.

The problem is that there is no action. There is no end result.

You log into Google Analytics, go to the “Real-Time” section, you discover that there are five people browsing your site. On pages X, Y, and Z.

So what. You are not getting any actionable information from this.

You then go to the other three sites you’ve got in Google Analytics. Rince and reapeat.

Solutions

The obvious solution is to just stop. But if it was as simple as that, you’d already have stopped and so would I.

Notifications

One of the things you are trying to do by going into your dashboards is to see what’s changed. Has anything interesting happened? One way you can short circuit this is to configure the dashboard to tell you when something interesting has happened.

I use the Pingdom service to monitor company websites and I almost never go into the Pingdom dashboard. Why? Because I’ve configured the service to send me a text message to my mobile whenever a website goes down. If I don’t receive a text, then there’s no need to look at the dashboard.

Notifications do come with a pretty big caveat: you must trust your notifications. If I can’t rely on Pingdom to tell me when my sites are down, I may be tempted to go dashboard hunting.

Even if your systems are only capable of sending email alerts, you can still turn those into mobile friendly notifications and even phone call alerts using services like PagerDuty, OpsGenie and Victorops.

Instead of needing a mobile phone app for each service you run, you can merge all of your alerts into a single app with mobile notifications.

If you haven’t received a notification, then you don’t need to go anywhere near the dashboard. Every time you think about going there just remind yourself that, if there was a problem, you’d already know about it.

Time Management Techniques

Pomodoro Technique or Timeboxing or … lots of other similar techniques.

None of the time management techniques will of themselves cure your statsitus, it just moves it into your own “pomodori“, or, the time in between your productive tasks. If you want to sacrifice your own leisure time to stats, then go nuts. But at least you’re getting work done in the mean time.

The Writers’ Den

One technique writers use to be more productive is the writer’s den. The idea is you set aside a space with as few distractions as possible, including a PC with just the software you need to work and that isn’t connected to the internet.

Not a bad idea, but unfortunately, a lot of us can’t simply switch the internet off. We either work directly with internet connected applications or we need to use reference materials only available on the web.

As ever, there are software solutions that can restrict access to sites at pre-defined times of the day. You could permit yourself free access for the first half hour of work, and then again just before you leave. Leaving a large chunk of your day for productive work.

The drawback with systems that restrict access is that, as you are likely to be the person setting up the system, you’re also likely to be able to circumvent it quite easily too.

Conclusion

For me, dashboards and stats can be both a boon and a curse at the same time. They can hint at problems and actions you need to take, but they can also suck an awful lot of time out of your day.

I’ve found that a combination of time management and really good notifications are a great way to stop the dashboard merry-go-round and put stats into their rightful place. A tool to help you improve, not an end in themselves.

The problem with concentrating too much on statistics is that it is so seductive. It feels like you are working hard but, in the end, if there are no actions coming out of the constant stats watching, then it is all wasted effort.

If you have any hints and tips how you overcame your dashboard merry-go-round, please leave a comment. :)

[Picture is of a French old-fashioned style carousel with stairs in La Rochelle by Jebulon (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons]

by Jack Hughes at April 16, 2015 03:26 PM

Chris Siebenmann

Are Python dictionaries necessarily constant-time data structures?

The general view of all forms of hash tables, Python's dictionaries included, is that they are essentially constant-time data structures under normal circumstances. This is not quite true under sufficiently perverse conditions where you have a high degree of collisions in the hashes of the keys, but let's assume that you don't have that for the moment. Ignoring hash collisions, can you treat dictionaries as fast constant-time data structures?

The answer is 'not always', and the path to it has some interesting and perhaps surprising consequences. The potential problem is custom hash functions. If the objects you're using as dictionary keys have a __hash__ method, this method must be called to return the object hash. This is Python code, so it's not necessarily going to be fast by comparison with regular dictionary operations. It may also take a visibly non-constant amount of time, depending on just how it's computing the hash.

(For instance, hashing even Python strings is actually not a constant time operation; it's linear with the length of the string. It's just that all of the code is in C, so by normal standards you're never going to notice the time differences.)

One of the things that you may want to consider as a result of this is memoizing the results of any expensive __hash__ method. This is what Python strings do; if you call hash() on a string, the actual hash computation is done only once and afterwards the already computed (and saved) hash value is just repeated back to you. This only works for things with immutable hash values, but then if your objects have hash values at all they should be immutable ones.

The real answer is that all of this is relatively theoretical. I'm pretty sure that almost no one uses complex custom __hash__ functions for objects defined in Python, although it seems relatively common to define simple ones that just delegate to the hash of another object (probably mostly or always a primitive object with a fast C level hash function). And if you do have objects with complex __hash__ functions that take noticeable amounts of time, you're probably not going to be using them as dictionary keys or set members very often because if you do, well, you'll notice.

On the other hand, the amount of work that the standard library's decimal.Decimal does in its __hash__ function is a little alarming (especially in Python 2.7). Having looked, I wouldn't encourage using them as dictionary keys or set members any time soon, at least not in high-volume dictionaries or sets. The Python 3 version of datetime is another potentially interesting case, since it does a certain amount of grinding away in Python __hash__ functions.

(In Python 2.7, datetime is a C-level module so all of its hashing operations presumably go really fast in general.)

Sidebar: Custom hashes and the Global Interpreter Lock

Let's ask another question: is adding a new key and value to a dictionary an atomic operation that's inherently protected by the GIL? After all, the key might have a custom __hash__ function that runs Python code (and thus bytecode) during any dictionary operation. As far as I can tell from peering at the CPython code, the answer is more or less yes. Although dictionary or set operations may require calling Python code for __hash__ (and for that matter for custom __eq__ methods as well), this is all done conceptually 'before' the actual dictionary modification takes place. The actual modification happens all at once, so you'll never see a dictionary with eg a key set but not its corresponding value.

This does mean that writing 'dct[ky] = val' may involve much more Python bytecode running than you expect (and thus a much higher chance that Python switches to another thread before the new key and value are added to the dictionary). But it's always been the case that Python might switch to another thread at almost any bytecode, so this hasn't created a new race, just widened the time window of an existing one you already had.

by cks at April 16, 2015 04:29 AM

April 15, 2015

LZone - Sysadmin

Benchmarking-Redis-and-Memcache



If you ever need to get some meaningful facts in a possible Redis vs memcached discussion you might want to benchmark both on your target system.

While Redis brings a tool redis-benchmark, memcached doesn't. But Redis author Salvatore Sanfilippo ported the Redis benchmark to memcached! So it is possible to measure quite similar metrics using the same math and result summaries for both key value stores.

Benchmarking Redis

So setup Redis in cluster mode, master/slave, whatever you like and run the Redis benchmark
apt-get install redis-tools	# available starting with Wheezy backports
redis-benchmark -h <host>

Benchmarking Memcached

And do the same for memcache by compiling the memcached port of the benchmark
apt-get install build-essentials libevent-dev
git clone https://github.com/antirez/mc-benchmark.git
cd mc-benchmark.git
make
and running it with
./mc-benchmark -h <host>
The benchmark output has the same structure, with more output in the Redis version compared to the memcached variant as each command type is tested and the Redis protocol knows many more commands.

April 15, 2015 09:16 PM

Puppet Check ERBs for Dynamic Scoping

If you ever need to upgrade a code base to Puppet 3.0 and strip all dynamic scoping from your templates:
for file in $( find . -name "*.erb" | sort); do 
    echo "------------ [ $file ]"; 
    if grep -q "%[^>]*$" $file; then 
        content=$(sed '/%/,/%>/!d' $file); 
    else
        content=$(grep "%" $file); 
    fi;
    echo "$content" | egrep "(.each|if |%=)" | egrep -v "scope.lookupvar|@|scope\["; 
done


This is of course just a fuzzy match, but should catch quite some of the dynamic scope expressions there are. The limits of this solution are:
  • false positives on loop and declared variables that must not be scoped
  • and false negatives when mixing of correct scope and missing scope in the same line.
So use with care.

April 15, 2015 09:16 PM

Splunk Cheat Sheet

Basic Searching Concepts

Simple searches look like the following examples. Note that there are literals with and without quoting and that there are field selections with an "=":
Exception                # just the word
One Two Three            # those three words in any order
"One Two Three"          # the exact phrase


# Filter all lines where field "status" has value 500 from access.log source="/var/log/apache/access.log" status=500

# Give me all fatal errors from syslog of the blog host host="myblog" source="/var/log/syslog" Fatal

Basic Filtering

Two important filters are "rex" and "regex".

"rex" is for extraction a pattern and storing it as a new field. This is why you need to specifiy a named extraction group in Perl like manner "(?...)" for example
source="some.log" Fatal | rex "(?i) msg=(?P[^,]+)"
When running above query check the list of "interesting fields" it now should have an entry "FIELDNAME" listing you the top 10 fatal messages from "some.log"

What is the difference to "regex" now? Well "regex" is like grep. Actually you can rephrase
source="some.log" Fatal
to
source="some.log" | regex _raw=".*Fatal.*"
and get the same result. The syntax of "regex" is simply "=". Using it makes sense once you want to filter for a specific field.

Calculations

Sum up a field and do some arithmetics:
... | stats sum(<field>) as result | eval result=(result/1000)
Determine the size of log events by checking len() of _raw. The p10() and p90() functions are returning the 10 and 90 percentiles:
| eval raw_len=len(_raw) | stats avg(raw_len), p10(raw_len), p90(raw_len) by sourcetype

Simple Useful Examples

Splunk usually auto-detects access.log fields so you can do queries like:
source="/var/log/nginx/access.log" HTTP 500
source="/var/log/nginx/access.log" HTTP (200 or 30*)
source="/var/log/nginx/access.log" status=404 | sort - uri 
source="/var/log/nginx/access.log" | head 1000 | top 50 clientip
source="/var/log/nginx/access.log" | head 1000 | top 50 referer
source="/var/log/nginx/access.log" | head 1000 | top 50 uri
source="/var/log/nginx/access.log" | head 1000 | top 50 method
...

Emailing Results

By appending "sendemail" to any query you get the result by mail!
... | sendemail to="john@example.com"

Timecharts

Create a timechart from a single field that should be summed up
... | table _time, <field> | timechart span=1d sum(<field>)
... | table _time, <field>, name | timechart span=1d sum(<field>) by name

Index Statistics

List All Indices
 | eventcount summarize=false index=* | dedup index | fields index
 | eventcount summarize=false report_size=true index=* | eval size_MB = round(size_bytes/1024/1024,2)
 | REST /services/data/indexes | table title
 | REST /services/data/indexes | table title splunk_server currentDBSizeMB frozenTimePeriodInSecs maxTime minTime totalEventCount
on the command line you can call
$SPLUNK_HOME/bin/splunk list index
To query write amount of per index the metrics.log can be used:
index=_internal source=*metrics.log group=per_index_thruput series=* | eval MB = round(kb/1024,2) | timechart sum(MB) as MB by series
MB per day per indexer / index
index=_internal metrics kb series!=_* "group=per_host_thruput" monthsago=1 | eval indexed_mb = kb / 1024 | timechart fixedrange=t span=1d sum(indexed_mb) by series | rename sum(indexed_mb) as totalmb


index=_internal metrics kb series!=_* "group=per_index_thruput" monthsago=1 | eval indexed_mb = kb / 1024 | timechart fixedrange=t span=1d sum(indexed_mb) by series | rename sum(indexed_mb) as totalmb

April 15, 2015 09:16 PM

Racker Hacker

Rackspace::Solve Atlanta Session Recap: “The New Normal”

This post originally appeared on the Rackspace Blog and I’ve posted it here for readers of this blog. Feel free to send over any comments you have!


solve-logo-1Most IT professionals would agree that 2014 was a long year. Heartbleed, Shellshock, Sandworm and POODLE were just a subset of the vulnerabilities that caused many of us to stay up late and reach for more coffee. As these vulnerabilities became public, I found myself fielding questions from non-technical family members after they watched the CBS Evening News and wondered what was happening. Security is now part of the popular discussion.

Aaron Hackney and I delivered a presentation at Rackspace::Solve Atlanta called “The New Normal” where we armed the audience with security strategies that channel spending to the most effective security improvements. Our approach at Rackspace is simple and balanced: use common sense prevention strategies, invest heavily in detection, and be sure you’re ready to respond when (not if) disaster strikes. We try to help companies prioritize by focusing on a few key areas. Know when there’s a breach. Know what they touched. Know who’s responsible. Below, I’ve included five ways to put this approach into practice.

First, common sense prevention includes using industry best practices like system and network hardening standards. Almost every device provides some kind of logging but we rarely review the logs and we often don’t know which types of events should trigger suspicion. Monitoring logs, securely configuring devices, and segmenting networks will lead to a great prevention strategy without significant costs (in time or money).

Second, many businesses will overspend on more focused prevention strategies before they know what they’re up against. This is where detection becomes key. Intrusion detection systems, log management systems, and NetFlow analysis can give you an idea of where an intruder might be within your systems and what they may have accessed. Combining these systems allows you to thwart the more advanced attackers that might use encrypted tunnels or move data via unusual protocols (like exfiltration via DNS or ICMP).

Third, when an incident does happen, everyone needs to know their place: including employees, partners, and customers. Every business needs a way to communicate incident severity without talking about the incident in great detail. If you’ve seen the movie WarGames, you probably remember them changing DEFCON levels at NORAD. Everyone knew their place and their duties whenever the DECFON level changed even if they didn’t know the specific nature of the incident. Think about how you will communicate when you really can’t — this is critical.

Fourth, the data gathered by the layers of detection combined with the root cause analysis (RCA) from the incident response will show you where to spend on additional prevention. RCA will also give you the metrics you need for conversation with executives around security changes.

One last tip – when you think about changes, opt for a larger number of smaller changes. The implementation will be less expensive and the probability of employee and customer backlash is greatly reduced.

For more tips on making changes within a company, I highly recommend reading Switch: How to Change When Change Is Hard.

We’d like to thank all of the Solve attendees who joined us for our talk. The questions after the talk were great and they led to plenty of hallway conversations afterwards. We hope to see you at a future Solve event!

The post Rackspace::Solve Atlanta Session Recap: “The New Normal” appeared first on major.io.

by Major Hayden at April 15, 2015 02:00 PM

Standalone Sysadmin

Spinning up a quick cloud instance with Digital Ocean

This is another in a short series of blog posts that will be brought together like Voltron to make something even cooler, but it’s useful on its own. 

I’ve written about using a couple other cloud providers before, like AWS and the HP cloud, but I haven’t actually mentioned Digital Ocean yet, which is strange, because they’ve been my go-to cloud provider for the past year or so. As you can see on their technology page, all of their instances are SSD backed, they’re virtualized with KVM, they’ve got IPv6 support, and there’s an API for when you need to automate instance creation.

To be honest, I’m not automating any of it. What I use it for is one-off tests. Spinning up a new “droplet” takes less than a minute, and unlike AWS, where there are a ton of choices, I click about three buttons and get a usable machine for whatever I’m doing.

To get the most out of it, the first step you need to do is to generate an SSH key if you don’t have one already. If you don’t set up key-based authentication, you’ll get the root password for your instance in your email, but ain’t nobody got time for that, so create the key using ssh-keygen (or if you’re on Windows, I conveniently covered setting up key-based authentication using pageant the other day – it’s almost like I’d planned this out).

Next, sign up for Digital Ocean. You can do this at DigitalOcean.com or you can get $10 for free by using my referral link (and I’ll get $25 in credit eventually).  Once you’re logged in, you can create a droplet by clicking the big friendly button:

This takes you to a relatively limited number of options – but limited in this case isn’t bad. It means you can spin up what you want without fussing about most of the details. You’ll be asked for your droplet’s hostname (which will be used to refer to the instance both in the Digital Ocean interface and will actually be set to to the hostname of the created machine),  you’ll need to specify the size of the machine you want (and at the current moment, here are the prices:)

The $10/mo option is conveniently highlighted, but honestly, most of my test stuff runs perfectly fine on the $5/mo, and most of my test stuff never runs for more than an hour, and 7/1000 of a dollar seems like a good deal to me. Even if you screw up and forget about it, it’s $5/mo. Just don’t set up a 64GB monster and leave that bad boy running.

Next there are several regions. For me, New York 3 is automatically selected, but I can override that default choice if I want. I just leave it, because I don’t care. You might care, especially if you’re going to be providing a service to someone in Europe or Asia.

The next options are for settings like Private Networking, IPv6, backups, and user data. Keep in mind that backups cost money (duh?), so don’t enable that feature for anything you don’t want to spend 20% of your monthly fee on.

The next option is honestly why I love Digital Ocean so much. The image selection is so painless and easy that it puts AWS to shame. Here:

You can see that the choice defaults to Ubuntu current stable, but look at the other choices! Plus, see that Applications tab? Check this out:

I literally have a GitLab install running permanently in Digital Ocean, and the sum total of my efforts were 5 seconds of clicking that button, and $10/mo (it requires a gig of RAM to run the software stack). So easy.

It doesn’t matter what you pick for spinning up a test instance, so you can go with the Ubuntu default or pick CentOS, or whatever you’d like. Below that selection, you’ll see the option for adding SSH keys. By default, you won’t have any listed, but you have a link to add a key, which pops open a text box where you can paste your public key text. The key(s) that you select will be added to the root user’s ~/.ssh/authorized_keys file, so that you can connect in without knowing the password. The machine can then be configured however you want. (Alternately, when selecting which image to spin up, you can spin up a previously-saved snapshot, backup, or old droplet which can be pre-configured (by you) to do what you need).

Click Create Droplet, and around a minute later, you’ll have a new entry in your droplet list that gives you the public IP to connect to. If you spun up a vanilla OS, SSH into it as the root user with one of the keys you specified, and if you selected one of the apps from the menu, try connecting to it over HTTP or HTTPS.

That’s really about it. In an upcoming entry, we’ll be playing with a Digital Ocean droplet to do some cool stuff, but I wanted to get this out here so that you could start playing with it, if you don’t already. Make sure to remember, though, whenever you’re done with your machine, you need to destroy it, rather than just shut it down. Shutting it down makes it unavailable, but keeps the data around, and that means you’ll keep getting billed for it. Destroy it and that erases the data and removes the instance, which is what causes you to be billed.

Have fun, and let me know if you have any questions!

by Matt Simmons at April 15, 2015 01:00 PM

Chris Siebenmann

Illusory security is terrible and is worse than no security

One of the possible responses to my entry on how your entire download infrastructure should be using HTTPS is to say more or less 'well, at least the current insecure approach is trying, surely that's better than ignoring the whole issue'. My answer is simple: no, it's not. The current situation covered in my entry is actually worse than not having any PGP signatures (and perhaps SHA1 hashes) at all.

In general, illusory security is worse than no security because in practice, illusory security fools people and so lulls them into a false sense of security. I'm pretty sure that almost everyone who does anything at all is going to read the Joyent page, faithfully follow the directions, and conclude that they're secure. As we know, all of their checking actually means almost nothing. In fact I'm pretty sure that the Joyent people who set up that page felt that it creates security.

What makes no security better than illusory security is that it's honest. If Joyent just said 'download this tarball from this HTTP URL', everyone would have the same effective security but anyone who was worried about it would know immediately that they have a problem. No one would be getting a false sense of security; instead they would have an honest sense of a lack of security.

It follows that if you're setting up security, it's very important to get it right. If you're not confident that you've got it right, the best thing you can do is shut up about it and not say anything. Do as much as you can to not lead people into a false sense of security, because almost all of them will follow you if you do.

(Of course this is easier said than done. Most people set out to create good security instead of illusory security, so there's a natural tendency to belive that you've succeeded.)

PS: Let me beat the really security-aware people to the punch by noting that an attacker can always insert false claims of security even if you leave them out yourself; since you don't have security, your lack of claims of it is delivered insecurely and so is subject to alteration. It's my view that such alterations are likely to be more dangerous for the attacker over the long term for various reasons. (If all they need is a short-term win, well, you're up the creek. Welcome to security, land of justified paranoia.)

by cks at April 15, 2015 04:35 AM

April 14, 2015

Racker Hacker

Woot! Eight years of my blog

The spring of 2015 marks eight years of this blog! I’ve learned plenty of tough lessons along the way and I’ve made some changes recently that might be handy for other people. After watching Sasha Laundy’s video from her awesome talk at Pycon 2015, I’m even more energized to share what I’ve learned with other people. (Seriously: Go watch that video or review the slides whether you work in IT or not. It’s worth your time.)

Let’s start from the beginning.

History Lesson

When I started at Rackspace in late 2006, I came from a fairly senior role at a very small company. I felt like I knew a lot and then discovered I knew almost nothing compared to my new coworkers at Rackspace. Sure, some of that was impostor syndrome kicking in, but much of it was due to being in the right place at the right time. I took a lot of notes in various places: notebooks, Tomboy notes, and plain text files. It wasn’t manageable and I knew I needed something else.

Rackspace ZeppelinMany of my customers were struggling to configure various applications on LAMP stacks and a frequent flier on my screen of tickets was WordPress. I installed it on a shared hosting account and began tossing my notes into it instead of the various other places. It was a bit easier to manage the content and it came with another handy feature: I could share links with coworkers when I knew how to fix something that they didn’t. In the long run, this was the best thing that came out of using WordPress.

Fast forward to today and the blog has more than 640 posts, 3,500 comments, and 100,000 sessions per month. I get plenty of compliments via email along with plenty of criticism. Winston Churchill said it best:

Criticism may not be agreeable, but it is necessary. It fulfils the same function as pain in the human body. It calls attention to an unhealthy state of things.

I love all the comments and emails I get — happy or unhappy. That’s what keeps me going.

Now Required: TLS (with optional Perfect Forward Secrecy)

I’ve offered encrypted connections on the blog for quite some time but it’s now a hard requirement. TLS 1.0, 1.1 and 1.2 are supported and the ciphers supporting Perfect Forward Secrecy (PFS) are preferred over those that don’t. For the super technical details, feel free to review a scan from Qualys’ SSL Labs.

You might be asking: “Why does a blog need encryption if I’m just coming by to read posts?” My response is “Why not?”. The cost for SSL certificates in today’s market is extremely inexpensive. For example, you can get three years on a COMODO certificate at CheapSSL for $5 USD per year. (I’m a promoter of CheapSSL — they’re great.)

Requiring encryption doesn’t add much overhead or load time but it may prevent someone from reading your network traffic or slipping in malicious code along with the reply from my server. Google also bumps up search engine rankings for sites with encryption available.

Moved to nginx

Apache has served up this blog exclusively since 2007. It’s always been my go-to web server of choice but I’ve taken some deep dives into nginx configuration lately. I’ve moved the blog over to a Fedora 21 virtual machine (on a Fedora 21 KVM hypervisor) running nginx with PHP running under php-fpm. It’s also using nginx’s fastcgi_cache which has really surprised me with its performance. Once a page is cached, I’m able to drag out about 800-900 Mbit/sec using ab.

Another added benefit from the change is that I’m now able to dump my caching-related plugins from WordPress. That means I have less to maintain and less to diagnose when something goes wrong.

Thanks!

Thanks for all of the emails, comments, and criticism over the years. I love getting those emails that say “Hey, you helped me fix something” or “Wow, I understand that now”. That’s what keeps me going. ;)

The post Woot! Eight years of my blog appeared first on major.io.

by Major Hayden at April 14, 2015 06:53 PM

Chris Siebenmann

Allowing people to be in more than 16 groups with an OmniOS NFS server

One of the long standing problems with traditional NFS is that the protocol only uses 16 groups; although you can be in lots of groups on the client (and on the server), the protocol itself only allows the client to tell the server about 16 of them. Recent versions of Illumos added a workaround (based on the Solaris one) where the server will ignore the list of groups the client sent it and look up the UID's full local group membership. Well, sometimes it will do this, if you get all of the conditions right.

There are two conditions. First, the request from the client must have a full 16 groups in it. This is normally what should happen if GIDs are synchronized between the server and the clients, but in exceptional cases you should watch out for this; if the client sends only 15 groups the server won't do any lookups locally and so can deny permissions for a file you actually have access to based on your server GID list.

Second and less obviously, the server itself must be explicitly configured to allow more than 16 groups. This is the kernel tunable ngroups_max, set in /etc/system:

set ngroups_max = 64

Any number larger than 16 will do, although you want it to cover the maximum number of groups you expect people to be in. I don't know if you can set it dynamically with mdb, so you probably really want to plan ahead on this one. On the positive side, this is the only server side change you need to make; no NFS service parameters need to be altered.

(This ngroups_max need is a little bit surprising if you're mostly familiar with other Unixes, which generally have much larger out of the box settings for this.)

This Illumos change made it into the just-released OmniOS r151014 but is not in any earlier version as far as I know. Anyways, r151014 is a LTS release so you probably want to be using it. I don't know enough about other Illumos distributions like SmartOS and Nexenta's offering to know when (or if) this change made it into them.

(The actual change is Illumos issue 5296 and was committed to the Illumos master in November 2014. The issue has a brief discussion of the implementation et al.)

Note that as far as I know the server and the client do not need to agree on the group list, provided that the client sends 16 groups. My test setup for this actually had me in exactly 16 groups on the client and some additional groups on the server, and it worked. This is a potential gotcha if you do not have perfect GID synchronization between server and client. You should, of course, but every so often things happen and things go wrong.

by cks at April 14, 2015 04:36 AM

Steve Kemp's Blog

Subject - Verb Agreement

There's pretty much no way that I can describe the act of cutting a live, 240V mains-voltage, wire in half with a pair of scissors which doesn't make me look like an idiot.

Yet yesterday evening that is exactly what I did.

There were mitigating circumstances, but trying to explain them would make little sense unless you could see the scene.

In conclusion: I'm alive, although I almost wasn't.

My scissors? They have a hole in them.

April 14, 2015 12:00 AM

April 13, 2015

TaoSecurity

Example of Chinese Military Converging on US Military

We often hear of vulnerabilities in the US military introduced by net-centric warfare and a reliance on communications network. As the Chinese military modernizes, it will introduce similar vulnerabilities.

I found another example of this phenomenon courtesy of Chinascope:

PLA Used its Online Purchasing Website for its First Online Purchase

Written by LKY and AEF   

Xinhua reported that on, April 7, the PLA announced that five manufacturers won the bidding, totaling 90 million yuan (US$14.48 million), to supply general and maintenance equipment to the PLA. The article said that these were the first purchase orders that the PLA received since it launched its military equipment purchasing website in January. The site is at http://www.weain.mil.cn/. 

The PLA claimed that it saved close to 12 million yuan (US$1.93 million) compared to the list price. The purchase order consisted of items such as containers for maintenance equipment and tools, gas masks, carrier cases, and army field lighting. The article said that the PLA equipment purchasing website was launched on January 4. On February 25, the PLA General and Maintenance department made a public announcement on the website calling for bids. On March 19, the public bidding was held at Ordnance Engineering College in Shijiazhuang City of Hebei Province. 

Over 20 manufacturers submitted bids and 5 of them, including some privately owned companies, won the bidding.

Source: Xinhua, April 12, 2015
http://news.xinhuanet.com/info/2015-04/12/c_134143641.htm

(emphasis added)

You can imagine the sorts of opportunities this story presents to adversaries, including impersonating the Chinese Web site, phishing either party (supplier or purchaser), and so on.

I expect other militaries to introduce similar vulnerabilities as they modernize, presenting more opportunities for their adversaries.

by Richard Bejtlich (noreply@blogger.com) at April 13, 2015 05:33 PM

Network Security Monitoring Remains Relevant

Cylance blogged today about a Redirect to SMB problem found in many Windows applications. Unfortunately, it facilitates credential theft. Steve Ragan wrote a good story discussing the problem. Note this issue does not rely on malware, at least not directly. It's a problem with Microsoft's Server Message Block protocol, with deep historical roots.

(Mitigating Service Account Credential Theft on Windows [pdf] is a good paper on mitigation techniques for a variety of SMB problems.)

Rather than discussing the technical problem, I wanted to make a different point. After reading about this technique, you probably want to know when an intruder uses it against you, so you can see it and preferably stop it.

However, you should be wondering if an intruder has already used it against you.

If you are practicing network security monitoring (described most recently in my newest book), then you should already be collecting network-based evidence of this attack.

  • You could check session data and infer that outbound traffic on using traditional SMB ports like 139 or 445 TCP are likely evidence of attack. 
  • You could review transaction data for artifacts of SMB traffic, looking for requests and replies. 
  • Best of all, you could review full content data directly for SMB traffic, and see exactly what happened. 

Whenever you see a discussion of a new attack vector, you will likely think "how do I stop it, or at least see it?"

Don't forget to think about ways to determine if an attacker has already used it against you. Chances are that certain classes of intruders have been exercising it for days, weeks, months, or perhaps years before it surfaced in the media.

PS: This post may remind you of my late 2013 post Linux Covert Channel Explains Why NSM Matters.

by Richard Bejtlich (noreply@blogger.com) at April 13, 2015 03:25 PM

Chris Siebenmann

One speed limit on your ability to upgrade your systems

One of the responses on Twitter to Ted Unangst's long term support considered harmful was this very interesting tweet:

[...] it's not "pain" - it just doesn't happen. At 2 weeks of planning + testing = 26 systems per year

This was eye-opening in a 'I hadn't thought about it that way before now' way. Like many insights, it's blindingly obvious in retrospect; of course how fast you can actually do an upgrade/update cycle determines how many of them you can do in a year (given various assumptions about manpower, parallelism, testing, and so on). And of course this limit applies across all of your systems. It's not just that you can only upgrade a given system so many times a year; it's that you get only so many upgrades in a year, period, across all of your systems.

(What the limit is depends very much on what systems you're trying to upgrade, since the planning, setup, and testing process will take different amounts of time for different systems.)

To upgrade systems more frequently, you have two options. First, you can reduce the time an upgrade cycle takes by speeding up or doing less planning, building, testing, and/or the actual deployment. Second, you can reduce the number of upgrades you need to do creating more uniform systems, so you amortize the time a cycle takes across more systems. If you have six special snowflakes running completely different OSes and upgrading each OS takes a month, you get twelve snowflake upgrades in a year (assuming you do nothing else). But if all six run the same OS in the same setup, you now get to upgrade all six of them more or less once a month (let's optimistically assume that deployment is a snap).

I see this as an interesting driver of uniformity (and at all levels, not just at the system level). Depending on how much pre-production testing you need and use, it's also an obvious driver of faster, better, and often more automated tests.

(Looking back I can certainly see cases where this 'we can only work so fast' stuff has been a limiting factor in our own work.)

by cks at April 13, 2015 02:10 AM

April 12, 2015

That grumpy BSD guy

Solaris Admins: For A Glimpse Of Your Networking Future, Install OpenBSD

Yet another proprietary tech titan turns to the free OpenBSD operating system as their source of innovation in the networking and security arena.

Roughly a week ago, on April 5th, 2015, parts of Oracle's roadmap for upcoming releases of their Solaris operating system was leaked in a message to the public OpenBSD tech developer mailing list. This is notable for several reasons, one is that Solaris, then owned and developed by (the now defunct) Sun Microsystems, was the original development platform for Darren Reed's IP Filter, more commonly known as IPF, which in turn was the software PF was designed to replace.

IPF was the original firewall in OpenBSD, and had at the time also been ported to NetBSD, FreeBSD and several other systems. However, over time IPF appears to have fallen out of favor almost everywhere, and as the (perhaps not quite intended as such) announcement has it,

IPF in Solaris is on its death row.
Which we can reasonably be taken to mean that Oracle, like the OpenBSD project back in 2001 but possibly not for the same reasons, are abandoning the legacy IP Filter code base, and moving on to something newer:

PF in 11.3 release will be available as optional firewall. We hope to make PF default (and only firewall) in Solaris 12. You've made excellent job, your PF is crystal-clear design.
Perhaps due to Oracle's practice of putting beta testers under non-disclosure agreements, or possibly because essentially no tech journalists ever read OpenBSD developer-focused mailing lists, Oracle's PF plans have not generated much attention in the press.

I personally find it quite interesting that the Oracle Solaris team are apparently taking in the PF code from OpenBSD. As far as I'm aware release dates for Solaris 11.3 and 12 have not been announced yet, but looking at the release cycle churn (check back to the Wikipedia page's Version history section), it's reasonable to assume that the first Solaris release with PF should be out some time in 2015.

The OpenBSD packet filter subsystem PF is not the first example of OpenBSD-originated software ending up in other projects or even in commercial, proprietary products.

Basically every Unix out there ships some version of OpenSSH, which is developed and maintained as part of the OpenBSD project, with a -portable flavor maintained in sync for others to use (a model that has been adopted by several other OpenBSD associated projects such as the OpenBGPD routing daemon, the OpenSMTPD mail daemon, and most recently, the LibreSSL TLS library. The portable flavors have generally seen extensive use outside the OpenBSD sphere such as Linux distributions and other Unixes.

The interesting thing this time around is that Oracle are apparently now taking their PF code directly from OpenBSD, in contrast to earlier code recipients such as Blackberry (who became PF consumers via NetBSD) and Apple, whose main interface with the world of open source appears to be the FreeBSD project, except for the time when the FreeBSD project was a little too slow in updating their PF code, ported over a fresher version and added some features under their own, non-compatible license.

Going back to the possibly unintended announcement, the fact that the Oracle developers produced a patch against OpenBSD-current, which was committed only a few days later, indicates that most likely they are working with fairly recent code and are probably following OpenBSD development closely.

If Oracle, or at least the Solaris parts of their distinctly non-diminutive organization, have started waking up to the fact that OpenBSD-originated software is high quality, secure stuff, we'll all be benefiting. Many of the world's largest corporations and government agencies are heavy Solaris users, meaning that even if you're neither an OpenBSD user or a Solaris user, your kit is likely interacting intensely with both kinds, and with Solaris moving to OpenBSD's PF for their filtering needs, we will all be benefiting even more from the OpenBSD project's emphasis on correctness, quality and security in the released OpenBSD code.

If you're a Solaris admin who's wondering what this all means to you, you can do several things to prepare for the future. One is to install OpenBSD somewhere (an LDOM in a spare corner of your T-series kit or an M-series domain will do, as will most kinds of x86ish kit) - preferably, also buying a CD set.

A second possibly smart action (and I've been dying to say this for a while to Solaris folks) is to buy The Book of PF -- recently updated to cover new features such as the traffic shaping system.

And finally, if you're based in North America (or if your boss is willing to fly you to Ottawa in June anyway), there's a BSDCan tutorial session you probably want to take a closer look at, featuring yours truly. Similar sessions elsewhere may be announced later, watch the Upcoming talks section on the upper right. If you're thinking of going to Ottawa or my other sessions, you may want to take a peek at my notes on tutorials originally for two earlier BSDCan sessions.

Update 2015-04-15: Several commenters and correspondents have asked two related questions: "Will Oracle contribute code and patches back?" and "Will Oracle donate to OpenBSD?". The answer to the first question is that it looks like they've already started. Which is of course nice. Bugfixes and well implemented feature enhancements are welcome, as long as they come under an acceptable license. The answer to the second question is, we don't know yet. It probably won't hurt if the Oracle developers themselves as well as Solaris users start pointing the powers that be at Oracle in the direction of the OpenBSD project's Donations page, which outlines several useful approaches to help financing the project.


If you find this or other articles of mine useful, irritating, enlightening or otherwise and want me and the world to know about it, please use the comments feature. Response will likely be quicker there than if you send me email.

by noreply@blogger.com (Peter N. M. Hansteen) at April 12, 2015 07:53 PM

TaoSecurity

Please Support OpenNSM Group

Do you believe in finding and removing intruders on the network before they cause damage? Do you want to support like-minded people? If you answered "yes," I'd like to tell you about a group that shares your views and needs your help.

In August 2014, Jon Schipp started the Open (-Source) Network Security Monitoring Group (OpenNSM). Jon is a security engineer at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign. In his announcement on the project's mailing list, Jon wrote:

The idea for this group came from a suggestion in Richard Bejtlich's most recent book, where he mentions it would be nice to see NSM groups spawn up all over much like other software user groups and for the same reasons.

Network security monitoring is the collection, analysis, and escalation of indications and warnings to detect and respond to intrusions. It is an operational campaign supporting a strategy of identifying and removing intruders before they accomplish their mission, thereby implementing a policy of minimizing loss due to intrusions. At the tactical and tool level, NSM relies on instrumenting the network and applying hunting and matching to find intruders.

Long-time blog readers know that I have developed and advocated NSM since the late 1990s, when I learned the practice at the Air Force Computer Emergency Response Team (AFCERT).

I am really pleased to see this group holding weekly meetings, which are available live or as recordings at YouTube.

The group is seeking funding and sponsorship to build a NSM laboratory and conduct research projects. They want to give students and active members hands-on experience with NSM tools and tactics to conduct defensive operations. They outline their plans for funding in this Google document.

I decided to support this group first as an individual, so I just donated $100 to the cause. If you are a like-minded individual, or perhaps represent an organization or company, please consider donating via GoFundMe to support this OpenNSM group and their project. You can also follow them @opennsm and Facebook, and check out their notes at code at GitHub. Thank you!

by Richard Bejtlich (noreply@blogger.com) at April 12, 2015 03:34 PM

SysAdmin1138

Paternity leave and on-call

It all started with this tweet.

Which you need to read (Medium.com). Some pull-quotes of interest:

My manager probably didn't realize that "How was your vacation" was the worst thing to ask me after I came back from paternity leave.

Patriarchy would have us believe that parenting is primarily the concern of the mother. Therefore paternity leave is a few extra days off for dad to chillax with his family and help mom out.

Beyond a recovery time from pregnancy, much of parental leave is learning to be a parent and adjusting to your new family and bonding with the baby. I can and did bond with the baby, but not as much as my female coworkers bonded with their babies.

I should also state, that I don't just want equality, I want a long time to bond with my child. Three months or more sounds nice. Not only can I learn to soothe him when he's upset, put him to sleep without worrying about being paged, but I can be around when he does the amazing things babies do in their first year: learning to sit, crawl, eat, stand and even walk.

At my current employer, I was shocked to learn that new dads get two weeks off.

Two.

At my previous startup, paternal leave was under the jurisdiction of the 'unlimited vacation' policy. Well...

Vacations are important. My friends would joke that the one way to actually be able to take vacations was to keep having children. Here the conflation was in jest, and also a caricature of the reality of vacations at startups.

We had a bit of a baby-boom while I was there. Dads were glared at if they showed up less than two weeks in and told to go home. After that, most of them worked part-time for a few weeks and slowly worked up to full time.

This article caused me to tweet...

The idea here is that IT managers who work for a company like mine with a really small amount of parental leave do have a bit of power to give Dad more time with the new kid: take them off of the call rota for a while. A better corporate policy is ideal, but it's a kind of local fix that just might help. Dad doesn't have to live to the pager and new-kid.

Interesting idea, but not a great one.

Which is a critique of the disaster-resilience of 3-person teams. I was on one, and we had to coordinate Summer Vacation Season to ensure we had two-person coverage for most of it, and if 1-person was unavoidable, keep it to a couple days at best. None of us had kids while I was there (the other two had teenagers, and I wasn't about to start), so we didn't get to live through a paternity-leave sized hole in coverage.

Which is the kind of team I'm on right now, and why I thought of the idea. We have enough people that a person sized hole, even a Sr. Engineer sized hole, can be filled for several to many weeks in the rotation.

That's the ideal route though, and touches on a very human point: if you're in a company where you always check mail or can expect pages off-hours, it doesn't matter if you're not in the official call-rotation. That's a company culture problem independent of the on-call rotation.

My idea can work, but it takes the right culture to pull off. Extended leave would be much better, and is the kind of thing we should be advocating for.

You should still read the article.

by SysAdmin1138 at April 12, 2015 01:43 PM

Chris Siebenmann

Spam victims don't care what business unit is responsible for the spam

So what happened is that the other day I got some spam that our MTA received from one of the outbound.protection.outlook.com machines. Since sometimes I'm stubborn, I actually tried reporting this to abuse@outlook.com. After some go-arounds (apparently the Outlook abuse staff don't notice email messages if they're MIME attachments), I got the following reply:

Thank you for your report. Based on the message header information you have provided, this email appears to have originated from an Office 365 or Exchange Online tenant account. To report junk mail from Office 365 tenants, please send an email to junk@office365.microsoft.com and include the junk mail as an attachment.

Ha ha, no. As I put it on Twitter, your spam victims don't care about what exact business unit is responsible for the specific systems or customers or whatever that sent spam. Sorting that out is your business, not theirs. Telling people complaining about spam to report it to someone else is a classic 'see figure one' response. What it actually means, as everyone who gets this understands, is that Microsoft doesn't actually want to get spam reports and doesn't actually want to stop spam.

Oh, sure, there's probably some internal bureaucratic excuse here. Maybe the abuse@outlook.com team is being scored on metrics like 'spam incidents processed per unit time' and 'amount of spam per unit time', and not having to count this as 'their' spam or spend time forwarding the message to other business units helps the numbers out. But this doesn't let Microsoft off the hook, because Microsoft set these metrics and allows them to stand despite predictable crappy results. If Microsoft really cared, outlook.com would not be the massive spam emitter that it is. Instead Microsoft is thoroughly in the 'see figure one' and 'we're too big for you to block' business, just like a lot of other big email providers.

(For people who do not already know this, 'see figure one' refers to a certain sort of grim humour from the early days of Usenet and possibly before then, as covered here and here. The first one may be more original, but the 'we don't care, we don't have to, we're the phone company' attitude is also authentic for how people read this sort of situation. Application to various modern organizations in your life is left as an exercise to the reader.)

by cks at April 12, 2015 06:04 AM

April 11, 2015

Racker Hacker

Run virsh and access libvirt as a regular user

libvirt logoLibvirt is a handy way to manage containers and virtual machines on various systems. On most distributions, you can only access the libvirt daemon via the root user by default. I’d rather use a regular non-root user to access libvirt and limit that access via groups.

Modern Linux distributions use Polkit to limit access to the libvirt daemon. You can add an extra rule to the existing set of Polkit rules to allow regular users to access libvirtd. Here’s an example rule (in Javascript) from the ArchWiki:

/* Allow users in kvm group to manage the libvirt
daemon without authentication */
polkit.addRule(function(action, subject) {
    if (action.id == "org.libvirt.unix.manage" &&
        subject.isInGroup("wheel")) {
            return polkit.Result.YES;
    }
});

As shown on the ArchWiki, I saved this file as /etc/polkit-1/rules.d/49-org.libvirt.unix.manager.rules. I’m using the wheel group to govern access to the libvirt daemon but you could use any group you choose. Just update the subject.isInGroup line in the rules file. You shouldn’t have to restart any daemons after adding the new rule file.

I’m now able to run virsh as my regular user:

[major@host ~]$ id
uid=1000(major) gid=1000(major) groups=1000(major),10(wheel) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[major@host ~]$ virsh list --all
 Id    Name                           State
----------------------------------------------------

The post Run virsh and access libvirt as a regular user appeared first on major.io.

by Major Hayden at April 11, 2015 03:30 PM

Chris Siebenmann

I wish systemd would get over its thing about syslog

Anyone who works with systemd soon comes to realize that systemd just doesn't like syslog very much. In fact systemd is so unhappy with syslog that it invented its own logging mechanism (in the form of journald). This is not news. What people who don't have to look deeply into the situation often don't realize is that systemd's dislike is sufficiently deep that systemd just doesn't interact very well with syslog.

I won't say that bugs and glitches 'abound', because I've only run into two issues so far (although both issues are relatively severe). One was that systemd mis-filed kernel messages under the syslog 'user' facility instead of the 'kernel' one; this bug made it past testing and into RHEL 7 / CentOS 7. The other is that sometimes on boot, randomly, systemd will barf up a significant chunk of old journal messages (sometimes very old) and re-send them to syslog. If you don't scroll back far enough while watching syslog logs, this can lead you to believe that something really bad and weird has happened.

(This has actually happened to me several times.)

This is stupid and wrongheaded on systemd's part. Yes, systemd doesn't like syslog. But syslog is extremely well established and extremely useful, especially in the server space. Part of that is historical practice, part of that is that syslog is basically the only cross-platform logging technology we have, and partly it's because you can do things like forward syslog to other machines, aggregate logs from multiple machines on one, and so on (and do so in a cross-platform way). And a good part of it is because syslog is simple text and it's always been easy to do a lot of powerful ad-hoc stuff with text. That systemd continually allows itself to ignore and interact badly with syslog makes everyone's life worse (except perhaps the systemd authors). Syslog is not going away just because the systemd authors would like it to and it is high time that systemd actually accepted that and started not just sort of working with syslog but working well with it.

One of systemd's strengths until now has been that it played relatively well (sometimes extremely well) with existing systems, warts and all. It saddens me to see systemd increasingly throw that away here.

(And I'll be frank, it genuinely angers me that systemd may feel that it can get away with this, that systemd is now so powerful that it doesn't have to play well with other systems and with existing practices. This sort of arrogance steps on real people; it's the same arrogance that leads people to break ABIs and APIs and then tell others 'well, that's your problem, keep up'.)

PS: If systemd people feel that systemd really does care about syslog and does its best to work well with it, well, you have two problems. The first is that your development process isn't managing to actually achieve this, and the second is that you have a perception problem among systemd users.

by cks at April 11, 2015 03:43 AM

Steve Kemp's Blog

Some things get moved, some things get doubled in size.

Relocation

We're about three months away from relocating from Edinburgh to Newcastle and some of the immediate panic has worn off.

We've sold our sofa, our spare sofa, etc, etc. We've bought a used dining-table, chairs, and a small sofa, etc. We need to populate the second-bedroom as an actual bedroom, do some painting, & etc, but things are slowly getting done.

I've registered myself as a landlord with the city council, so that I can rent the flat out without getting into trouble, and I'm in the process of discussing the income possabilities with a couple of agencies.

We're still unsure of precisely which hospital, from the many choices, in Newcastle my wife will be stationed at. That's frustrating because she could be in the city proper, or outside it. So we need to know before we can find a place to rent there.

Anyway moving? It'll be annoying, but we're making progress. Plus, how hard can it be?

VLAN Expansion

I previously had a /28 assigned for my own use, now I've doubled that to a /27 which gives me the ability to create more virtual machines and run some SSL on some websites.

Using SNI I've actually got the ability to run SSL almost all sites. So I configured myself as a CA and generated a bunch of certificates for myself. (Annoyingly few tutorials on running a CA mentioned SNI so it took a few attempts to get the SAN working. But once I got the hang of it it was simple enough.)

So if you have my certificate authority file installed you can browse many, many of my interesting websites over SSL.

SSL

I run a number of servers behind a reverse-proxy. At the moment the back-end is lighttpd. Now that I have SSL setup the incoming requests hit the proxy, get routed to lighttpd and all is well. Mostly.

However redirections break. A request for:

  • https://lumail.org/docs

Gets rewritten to:

  • http://lumail.org/docs/

That is because lighttpd generates the redirection and it only sees the HTTP connection. It seems there is mod_extforward which should allow the server to be aware of the SSL - but it doesn't do so in a useful fashion.

So right now most of my sites are SSL-enabled, but sometimes they'll flip to naked and unprotected. Annoying.

I don't yet have a solution..

April 11, 2015 12:00 AM

April 10, 2015

LZone - Sysadmin

Puppet Solve Invalid byte sequence in US-ASCII

When you run "puppet agent" and get
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: invalid byte 
sequence in US-ASCII at /etc/puppet/modules/vendor/
or run "puppet apply" and get
Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not 
parse for environment production: invalid byte sequence in US-ASCII at /etc/puppet/manifests/site.pp:1
then the root case is probably the currently configured locale. Check the effective Ruby locale with
ruby -e 'puts Encoding.default_external'
Ensure that it returns a UTF-8 capable locale, if needed set it and rerun Puppet:
export LANG=de_DE.utf-8
export LC_ALL=de_DE.utf-8

April 10, 2015 09:16 PM

Web-Developer-Solution-Index

After a friend of mine suggested reading "The things you need to know to do web development I felt the need to compile a solution index for the experiences described. In this interesting blog post describes his view of the typical learning curve of a web developer and the tools, solutions and concepts he discovers when he becomes a successful developer.

I do not want to summarize the post but I wanted to compile a list of those solutions and concepts affecting the life of a web developer.





Markup Standards Knowledge HTML, CSS, JSON, YAML
Web Stack Layering Basic knowledge about
  • Using TCP as transport protocol
  • Using HTTP as application protocol
  • Using SSL to encrypt the application layer with HTTPS
  • Using SSL certificates to proof identity for websites
  • Using (X)HTML for the application layer
  • Using DOM to access/manipulate application objects
Web Development Concepts
  • 3 tier server architecture
  • Distinction of static/dynamic content
  • Asynchronous CSS, JS loading
  • Asynchronous networking with Ajax
  • CSS box models
  • CSS Media Queries
  • Content Delivery Networks
  • UX, Usability...
  • Responsive Design
  • Page Speed Optimization
  • HTTP/HTTPS content mixing
  • Cross domain content mixing
  • MIME types
  • API-Pattern: RPC, SOAP, REST
  • Localization and Internationalization
Developer Infrastructure
  • Code Version Repo: usually Git. Hosted github.com or self-hosted e.g. gitlab
  • Continuous Integration: Jenkins, Travis
  • Deployment: Jenkins, Travis, fabric, Bamboo, CruiseControl
Frontend JS Frameworks Mandatory knowledge in jQuery as well as knowing one or more JS frameworks as

Bootstrap, Foundation, React, Angular, Ember, Backbone, Prototype, GWT, YUI


Localization and Internationalization Frontend: usually in JS lib e.g. LocalePlanet or Globalize

Backend: often in gettext or a similar mechanism

Precompiling Resources For Javascript: Minify

For CSS: For Images: ImageMagick

Test everything with Google PageSpeed Insights

Backend Frameworks By language
  • PHP: CakePHP, CodeIgniter, Symfony, Seagull, Zend, Yii (choose one)
  • Python: Django, Tornado, Pylons, Zope, Bottle (choose one)
  • Ruby: Rails, Merb, Camping, Ramaze (choose one)


Web Server Solutions nginx, Apache

For loadbalancing: nginx, haproxy

As PHP webserver: nginx+PHPFPM

RDBMS MySQL (maybe Percona, MariaDB), Postgres

Caching/NoSQL Without replication: memcached, memcachedb, Redis

With replication: Redis, Couchbase, MongoDB, Cassandra

Good comparisons: #1 #2

Hosting If you are unsure about self-hosting vs. cloud hosting have a look at the Cloud Calculator.

Blogs Do not try to self-host blogs. You will fail on keeping them secure and up-to-date and sooner or later they are hacked. Start with a blog hoster right from the start: Choose provider

April 10, 2015 09:16 PM

Chef-Editing-Config-Files

Most chef recipes are about installing new software including all config files. Also if they are configuration recipes they usually overwrite the whole file and provide a completely recreated configuration. When you have used cfengine and puppet with augtool before you'll be missing the agile editing of config files.

In cfengine2...

You could write
editfiles:
{ home/.bashrc
   AppendIfNoSuchLine "alias rm='rm -i'"
}

While in puppet...

You'd have:
augeas { "sshd_config":
  context => "/files/etc/ssh/sshd_config",
  changes => [
    "set PermitRootLogin no",
  ],
}

Now how to do it in Chef?

Maybe I missed the correct way to do it until now (please comment if this is the case!) but there seems to be no way to use for example augtool with chef and there is no built-in cfengine like editing. The only way I've seen so far is to use Ruby as a scripting language to change files using the Ruby runtime or to use the Script ressource which allows running other interpreters like bash, csh, perl, python or ruby.

To use it you can define a block named like the interpreter you need and add a "code" attribute with a "here doc" operator (e.g. -EOT) describing the commands. Additionally you specify a working directory and a user for the script to be executed with. Example:
bash "some_commands" do
    user "root"
    cwd "/tmp"
    code -EOT
       echo "alias rm='rm -i'" >> /root/.bashrc
    EOT
end
While it is not a one-liner statement as possible as in cfengine it is very flexible. The Script resource is widely used to perform ad-hoc source compilation and installations in the community codebooks, but we can also use it for standard file editing.

Finally to do conditional editing use not_if/only_if clauses at the end of the Script resource block.

April 10, 2015 09:16 PM

Puppet Apply Only Specific Classes

If you want to apply Puppet changes in an selective manner you can run
puppet apply -t --tags Some::Class
on the client node to only run the single class named "Some::Class".

Why does this work? Because Puppet automatically creates tags for all classes you have. Ensure to upper-case all parts of the class name, because even if you actual Ruby class is "some::class" the Puppet tag will be "Some::Class".

April 10, 2015 09:16 PM

Puppet Agent Noop Pitfalls

The puppet agent command has a --noop switch that allows you to perform a dry-run of your Puppet code.
puppet agent -t --noop
It doesn't change anything, it just tells you what it would change. More or less exact due to the nature of dependencies that might come into existance by runtime changes. But it is pretty helpful and all Puppet users I know use it from time to time.

Unexpected Things

But there are some unexpected things about the noop mode:
  1. A --noop run does trigger the report server.
  2. The --noop run rewrites the YAML state files in /var/lib/puppet
  3. And there is no state on the local machine that gives you the last "real" run result after you overwrite the state files with the --noop run.

Why might this be a problem?

Or the other way around: why Puppet think this is not a problem? Probably because Puppet as an automation tool should overwrite and the past state doesn't really matter. If you use PE or Puppet with PuppetDB or Foreman you have an reporting for past runs anyway, so no need to have a history on the Puppet client.

Why I still do not like it: it avoids having safe and simple local Nagios checks. Using the state YAML you might want to build a simple script checking for run errors. Because you might want a Nagios alert about all errors that appear. Or about hosts that did not run Puppet for quite some time (for example I wanted to disable Puppet on a server for some action and forgot to reenable). Such a check reports false positives each time someone does a --noop run until the next normal run. This hides errors.

Of course you can build all this with cool Devops style SQL/REST/... queries to PuppetDB/Foreman, but checking state locally seems a bit more the old-style robust and simpler sysadmin way. Actively asking the Puppet master or report server for the client state seems wrong. The client should know too.

From a software usability perspective I do not expect a tool do change it's state when I pass --noop. It's unexpected. Of course the documentation is carefull phrased:
Use 'noop' mode where the daemon runs in a no-op or dry-run mode. This is useful for seeing what changes Puppet will make without actually executing the changes.

April 10, 2015 09:16 PM

Getting rid of Bash Ctrl-R

Today was a good day, as I stumbled over this post (at http://architects.dzone.com) hinting on the following bash key bindings:
bind '"\e[A":history-search-backward'
bind '"\e[B":history-search-forward'
It changes the behaviour of the up and down cursor keys to not go blindly through the history but only through items matching the current prompt. Of course at the disadvantage of having to clear the line to go through the full history. But as this can be achieved by a Ctrl-C at any time it is still preferrable to Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R Ctrl+R ....

April 10, 2015 09:16 PM

Redis Performance Debugging

Here are some simple hints on debugging Redis performance issues.

Monitoring Live Redis Queries

Run the "monitor" command to see queries as they are sent against an Redis instance. Do not use on high traffic instance!
redis-cli monitor
The output looks like this
redis 127.0.0.1:6379> MONITOR
OK
1371241093.375324 "monitor"
1371241109.735725 "keys" "*"
1371241152.344504 "set" "testkey" "1"
1371241165.169184 "get" "testkey"

Analyzing Slow Commands

When there are too many queries better use "slowlog" to see the top slow queries running against your Redis instance:
slowlog get 25		# print top 25 slow queries
slowlog len		
slowlog reset

Debugging Latency

If you suspect latency to be an issue use "redis-cli" built-in support for latency measuring. First measure system latency on your Redis server with
redis-cli --intrinsic-latency 100
and then sample from your Redis clients with
redis-cli --latency -h <host> -p <port>
If you have problems with high latency check if transparent huge pages are disabled. Disable it with
echo never > /sys/kernel/mm/transparent_hugepage/enabled

Check Background Save Settings

If your instance seemingly freezes peridiocally you probably have background dumping enabled.
grep ^save /etc/redis/redis.conf
Comment out all save lines and setup a cron job to do dumping or a Redis slave who can dump whenever he wants to.

Alternatively you can try to mitigate the effect using the "no-appendfsync-on-rewrite" option (set to "yes") in redis.conf.

Check fsync Setting

Per default Redis runs fsync() every 1s. Other possibilities are "always" and "no".
grep ^appendfsync /etc/redis/redis.conf
So if you do not care about DB corruption you might want to set "no" here.

April 10, 2015 09:16 PM

How to Munin Graph JVM Memory Usage with Ubuntu tomcat

The following description works when using the Ubuntu "tomcat7" package:

Grab the "java/jstat__heap" plugin from munin-contrib @ github and place it into "/usr/share/munin/plugins/jstat__heap".

Link the plugin into /etc/munin/plugins
ln -s /usr/share/munin/plugins/jstat__heap /etc/munin/plugins/jstat_myname_heap
Choose some useful name instead of "myname". This allows to monitor multiple JVM setups.

Configure each link you created in for example a new plugin config file named "/etc/munin/plugin-conf.d/jstat" which should contain one section per JVM looking like this
[jstat_myname_heap]
user tomcat7
env.pidfilepath /var/run/tomcat7.pid
env.javahome /usr/

April 10, 2015 09:16 PM

Removing newlines with sed

My goal for today: I want to remember the official sed FAQ solution to replace multiple newlines:
sed ':a;N;$!ba;s/\n//g' file
to avoid spending a lot of time on it when I need it again.

April 10, 2015 09:16 PM

Puppet: List Changed Files

If you want to know which files where changed by puppet in the last days:
cd /var/lib/puppet
for i in $(find clientbucket/ -name paths); do
	echo "$(stat -c %y $i | sed 's/\..*//')       $(cat $i)";
done | sort -n
will give you an output like
[...]
2015-02-10 12:36:25       /etc/resolv.conf
2015-02-17 10:52:09       /etc/bash.bashrc
2015-02-20 14:48:18       /etc/snmp/snmpd.conf
2015-02-20 14:50:53       /etc/snmp/snmpd.conf
[...]

April 10, 2015 09:16 PM

Sharing Screen With Multiple Users

How to detect screen sessions of other users:

screen -ls <user name>/

How to open screen to other users:

  1. Ctrl-A :multiuser on
  2. Ctrl-A :acladd <user to grant access>

Attach to other users screen session:

With session name
screen -x <user name>/<session name>
With PID and tty
screen -x <user name>/<pid>.<ptty>.<host>

April 10, 2015 09:16 PM

Static Code Analysis of any Autotools Project with OCLint

The following is a HowTo describing the setup of OCLint for any C/C++ project using autotools.

1. OCLint Setup

First step is downloading OCLint, as there are no package so far, it's just extracting the tarball somewhere in $HOME. Check out the latest release link on http://archives.oclint.org/releases/.
cd
wget "http://archives.oclint.org/releases/0.8/oclint-0.8.1-x86_64-linux-3.13.0-35-generic.tar.gz"
tar zxvf oclint-0.8.1-x86_64-linux-3.13.0-35-generic.tar.gz 
This should leave you with a copy of OCLint in ~/oclint-0.8.1

2. Bear Setup

As project usually consist of a lot of source files in different subdirectories it is hard for a linter to know where to look for files. While "cmake" has support for dumping a list of source files it processes during a run "make" doesn't. This is where the "Bear" wrapper comes to play: instead of
make
you run
bear make
so "bear" can track all files being compiled. It will dump a JSON file "compile_commands.json" which OCLint can use to do analysis of all files.

To setup Bear do the following
cd
git clone https://github.com/rizsotto/Bear.git
cd Bear
cmake .
make

3. Analyzing Code

Now we have all the tools we need. Let's download some autotools project like Liferea. Before doing code analysis it should be downloaded and build at least once:
git clone https://github.com/lwindolf/liferea.git
cd liferea
sh autogen.sh
make
Now we collect all code file compilation instructions with bear:
make clean
bear make
And if this succeed we can start a complete analysis with
~/oclint-0.8.1/bin/oclint-json-compilation-database
which will run OCLint with the input from "compile_commands.json" produced by "bear". Don't call "oclint" directly as you'd need to pass all compile flags manually.

If all went well you could see code analysis lines like those:
[...]
conf.c:263:9: useless parentheses P3 
conf.c:274:9: useless parentheses P3 
conf.c:284:9: useless parentheses P3 
conf.c:46:1: high cyclomatic complexity P2 Cyclomatic Complexity Number 33 exceeds limit of 10
conf.c:157:1: high cyclomatic complexity P2 Cyclomatic Complexity Number 12 exceeds limit of 10
conf.c:229:1: high cyclomatic complexity P2 Cyclomatic Complexity Number 30 exceeds limit of 10
conf.c:78:1: long method P3 Method with 55 lines exceeds limit of 50
conf.c:50:2: short variable name P3 Variable name with 2 characters is shorter than the threshold of 3
conf.c:52:2: short variable name P3 Variable name with 1 characters is shorter than the threshold of 3
[...]

April 10, 2015 09:16 PM

How Common Are HTTP Security Headers Really?

A recent issue of the German iX magazin featured an article on improving end user security by enabling HTTP security headers
  • X-XSS-Protection,
  • X-Content-Type-Options MIME type sniffing,
  • Content-Security-Policy,
  • X-Frame-Options,
  • and HSTS Strict-Transport-Security.
The article gave the impression of all of them quite common and a good DevOps being unreasonable not implementing them immediately if the application supports them without problems.

This lead me to check my monthly domain scan results of April 2014 on who is actually using which header on their main pages. Results as always limited to top 200 Alexa sites and all larger German websites.

Usage of X-XSS-Protection

Header visible for only 14 of 245 (5%) of the scanned websites. As 2 are just disabling the setting it is only 4% of the websites enabling it.
WebsiteHeader
www.adcash.comX-XSS-Protection: 1; mode=block
www.badoo.comX-XSS-Protection: 1; mode=block
www.blogger.comX-XSS-Protection: 1; mode=block
www.blogspot.comX-XSS-Protection: 1; mode=block
www.facebook.comX-XSS-Protection: 0
www.feedburner.comX-XSS-Protection: 1; mode=block
www.github.comX-XSS-Protection: 1; mode=block
www.google.deX-XSS-Protection: 1; mode=block
www.live.comX-XSS-Protection: 0
www.meinestadt.deX-XSS-Protection: 1; mode=block
www.openstreetmap.orgX-XSS-Protection: 1; mode=block
www.tape.tvX-XSS-Protection: 1; mode=block
www.xing.deX-XSS-Protection: 1; mode=block; report=https://www.xing.com/tools/xss_reporter
www.youtube.deX-XSS-Protection: 1; mode=block; report=https://www.google.com/appserve/security-bugs/log/youtube

Usage of X-Content-Type-Options

Here 15 of 245 websites (6%) enable the option.
WebsiteHeader
www.blogger.comX-Content-Type-Options: nosniff
www.blogspot.comX-Content-Type-Options: nosniff
www.deutschepost.deX-Content-Type-Options: NOSNIFF
www.facebook.comX-Content-Type-Options: nosniff
www.feedburner.comX-Content-Type-Options: nosniff
www.github.comX-Content-Type-Options: nosniff
www.linkedin.comX-Content-Type-Options: nosniff
www.live.comX-Content-Type-Options: nosniff
www.meinestadt.deX-Content-Type-Options: nosniff
www.openstreetmap.orgX-Content-Type-Options: nosniff
www.spotify.comX-Content-Type-Options: nosniff
www.tape.tvX-Content-Type-Options: nosniff
www.wikihow.comX-Content-Type-Options: nosniff
www.wikipedia.orgX-Content-Type-Options: nosniff
www.youtube.deX-Content-Type-Options: nosniff

Usage of Content-Security-Policy

Actually only 1 website in the top 200 Alexa ranked websites uses CSP and this lonely site is github. The problem with CSP obviously being the necessity to have a clear structure for the origin domains of the site elements. And the less advertisments and tracking pixels you have the easier it becomes...
WebsiteHeader
www.github.comContent-Security-Policy: default-src *; script-src https://github.global.ssl.fastly.net https://ssl.google-analytics.com https://collector-cdn.github.com; style-src 'self' 'unsafe-inline' 'unsafe-eval' https://github.global.ssl.fastly.net; object-src https://github.global.ssl.fastly.net

Usage of X-Frame-Options

The X-Frame-Options header is currently delivered by 43 of 245 websites (17%).
WebsiteHeader
www.adcash.comX-Frame-Options: SAMEORIGIN
www.adf.lyX-Frame-Options: SAMEORIGIN
www.avg.comX-Frame-Options: SAMEORIGIN
www.badoo.comX-Frame-Options: DENY
www.battle.netX-Frame-Options: SAMEORIGIN
www.blogger.comX-Frame-Options: SAMEORIGIN
www.blogspot.comX-Frame-Options: SAMEORIGIN
www.dailymotion.comX-Frame-Options: deny
www.deutschepost.deX-Frame-Options: SAMEORIGIN
www.ebay.deX-Frame-Options: SAMEORIGIN
www.facebook.comX-Frame-Options: DENY
www.feedburner.comX-Frame-Options: SAMEORIGIN
www.github.comX-Frame-Options: deny
www.gmx.deX-Frame-Options: deny
www.gmx.netX-Frame-Options: deny
www.google.deX-Frame-Options: SAMEORIGIN
www.groupon.deX-Frame-Options: SAMEORIGIN
www.imdb.comX-Frame-Options: SAMEORIGIN
www.indeed.comX-Frame-Options: SAMEORIGIN
www.instagram.comX-Frame-Options: SAMEORIGIN
www.java.comX-Frame-Options: SAMEORIGIN
www.linkedin.comX-Frame-Options: SAMEORIGIN
www.live.comX-Frame-Options: deny
www.mail.ruX-Frame-Options: SAMEORIGIN
www.mozilla.orgX-Frame-Options: DENY
www.netflix.comX-Frame-Options: SAMEORIGIN
www.openstreetmap.orgX-Frame-Options: SAMEORIGIN
www.oracle.comX-Frame-Options: SAMEORIGIN
www.paypal.comX-Frame-Options: SAMEORIGIN
www.pingdom.comX-Frame-Options: SAMEORIGIN
www.skype.comX-Frame-Options: SAMEORIGIN
www.skype.deX-Frame-Options: SAMEORIGIN
www.softpedia.comX-Frame-Options: SAMEORIGIN
www.soundcloud.comX-Frame-Options: SAMEORIGIN
www.sourceforge.netX-Frame-Options: SAMEORIGIN
www.spotify.comX-Frame-Options: SAMEORIGIN
www.stackoverflow.comX-Frame-Options: SAMEORIGIN
www.tape.tvX-Frame-Options: SAMEORIGIN
www.web.deX-Frame-Options: deny
www.wikihow.comX-Frame-Options: SAMEORIGIN
www.wordpress.comX-Frame-Options: SAMEORIGIN
www.yandex.ruX-Frame-Options: DENY
www.youtube.deX-Frame-Options: SAMEORIGIN

Usage of HSTS Strict-Transport-Security

HSTS headers can only be found on a few front pages (8 of 245). Maybe it is visible more on the login pages and is avoided on front pages for performance reasons, maybe not. That would require further analysis. What can be said is only some larger technology leaders are brave enough to use it on the front page:
WebsiteHeader
www.blogger.comStrict-Transport-Security: max-age=10893354; includeSubDomains
www.blogspot.comStrict-Transport-Security: max-age=10893354; includeSubDomains
www.facebook.comStrict-Transport-Security: max-age=2592000
www.feedburner.comStrict-Transport-Security: max-age=10893354; includeSubDomains
www.github.comStrict-Transport-Security: max-age=31536000
www.paypal.comStrict-Transport-Security: max-age=14400
www.spotify.comStrict-Transport-Security: max-age=31536000
www.upjers.comStrict-Transport-Security: max-age=47336400

Conclusion

Security headers are not wide-spread on website front pages at least. Most used is the X-Frame-Option header to prevent clickjacking. Next following is X-Content-Type-Options to prevent MIME sniffing. Both of course are easy to implement as they most probably do not change your websites behaviour. I'd expect to see more HSTS on bank and other online payment service websites, but it might well be that the headers appear only on subsequent redirects when logging in, which this scan doesn't do. With CSP being the hardest to implement, as you need to have complete control over all domain usage by application content and partner content you embed, it is no wonder that only Github.com has implemented it. For me it is an indication how clean their web application actually is.

April 10, 2015 09:16 PM

Screen tmux Cheat Sheet

Here is a side by side comparison of screen and tmux commands and hotkeys.
Function Screen tmux
Start instance screen screen -S <name> tmux
Attach to instance screen -r <name> screen -x <name> tmux attach
List instances screen -ls screen -ls <user name>/ tmux ls
New Window ^a c ^b c
Switch Window ^a n ^a p ^b n ^b p
List Windows ^a " ^b w
Name Window ^a A ^b ,
Split Horizontal ^a S ^b "
Split Vertical ^a | ^b %
Switch Pane ^a Tab ^b o
Kill Pane ^a x ^b x
Paging ^b PgUp ^b PgDown
Scrolling Mode ^a [ ^b [

April 10, 2015 09:16 PM

Standalone Sysadmin

Dealing with key-based authentication on Windows with Putty

I’m writing this entry because I’m going to be writing another entry soon, and I want to point to this rather than explain it in situ. 

Here lately, I’ve been using Windows on my desktop. At work, this is mostly because of the extensive administration I do with VMware, and there’s STILL no native way on Linux or Mac to do things like Update Manager, and at home, because I play lots of video games. Lots. Of. Games.

The end result is that I spend a lot of time using Putty. There are a lot of Windows-specific SSH clients, but I like Putty’s great combination of being tiny, running without any actual installation, and reasonably dense feature-set. If you’re on Windows and you need to deal with Linux hosts, you’re probably already using Putty, but maybe not as completely as you could be.

There is a small  ecosystem of applications that work with Putty, including sftp clients and an SSH client that runs in the Windows command prompt (plink). They’re all available on the same Putty download page. The biggest win, in my opinion, is to combine it with Pageant. Much like ssh-agent on Linux, Pageant manages your SSH keys, allowing you to log into remote hosts without typing passwords, and only typing your key’s passphrase once.

The first step with key-based authentication is to actually generate some keys. For Pageant, the easiest way is probably to use PuttyGen, which looks like this:

Click “Generate” and move the mouse around as the directions say:

This produces your actual key:

 

You want to type in a “Key passphrase” that is a long-ish phrase that you can remember well enough to re-type occasionally. Once you’ve done that, click “Save public key”, make a keys directory, and save it in there, then do the same with “Save private key”. You should care that people don’t get the private key, but your passphrase should be long enough that it’s unlikely that anyone could brute-force your key before you change it or lose it or maybe if you like typing, until the heat death of the universe.

Copy the text at the top and save that into notepad so we can have it after this closes. We can get it again by re-running the key generator, but if you’re like me, you didn’t install it, you just kind of ran it from your downloads, and you’d probably have to download it again to run it again, so just keep the text in Notepad for now.

Alright, so now you want to download Pageant and this time, you want to save it somewhere useful. I have a “Programs” directory that I made under C:\Users\msimmons\ that holds stuff like this, so I saved it there. Once it was there, I right clicked and said “Create Shortcut”, which I then dragged into C:\Users\msimmons\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup – this makes sure that Pageant will start when I log in. By default, that won’t actually load my key, though, so we have to edit the properties on the shortcut and add the key as an argument to the executable:

 

Now, when you log in, you’ll be prompted to type the passphrase to your private key, which will allow you to put that public key into the authorized_keys of a target host and authenticate as that user without typing a password every time! Excellent!

by Matt Simmons at April 10, 2015 01:00 PM

Chris Siebenmann

My Firefox 37 extensions and addons (sort of)

A lot has changed in less than a year since I last tried to do a comprehensive inventory of my extensions, so I've decided it's time for an update since things seem to have stabilized for the moment. I'm labeling this as for Firefox 37 since that's the just out latest version, but I'm actually running Firefox Nightly (although for me it's more like 'Firefox Weekly', since I only bother quitting Firefox to switch to the very latest build once in a while). I don't think any of these extension work better in Nightly than in Firefox 37 (if anything, some of them may work better in F37).

Personally I hope I'm still using this set of extensions a year from now, but with Firefox (and its addons) you never know.

Safe browsing:

  • NoScript to disable JavaScript for almost everything. In a lot of cases I don't even bother with temporary whitelisting; if a site looks like it's going to want lots of JavaScript, I just fire it up in my Chrome Incognito environment.

    NoScript is about half of my Flash blocking, but is not the only thing I have to rely on these days.

  • FlashStopper is the other half of my Flash blocking and my current solution to my Flash video hassles on YouTube, after FlashBlock ended up falling over. Note that contrary to what its name might lead you to expect, FlashStopper blocks HTML5 video too, with no additional extension needed.

    (In theory I should be able to deal with YouTube with NoScript alone, and this even works in my testing Firefox. Just not in my main one for some reason. FlashStopper is in some ways nicer than using NoScript for this; for instance, you see preview pictures for YouTube videos instead of a big 'this is blocked' marker.)

  • µBlock has replaced the AdBlock family as my ad blocker. As mentioned I mostly have this because throwing out YouTube ads makes YouTube massively nicer to use. Just as other people have found, µBlock clearly takes up the least memory out of all of the options I've tried.

    (While I'm probably not all that vulnerable to ad security issues, it doesn't hurt my mood that µBlock deals with these too.)

  • CS Lite Mod is my current 'works on modern Firefox versions' replacement for CookieSafe after CookieSafe's UI broke for me recently (I needed to whitelist a domain and discovered I couldn't any more). It appears to basically work just like CookieSafe did, so I'm happy.

I've considered switching to Self-Destructing Cookies, but how SDC mostly works is not how I want to deal with cookies. It would be a good option if I had to use a lot of cookie-requiring sites that I didn't trust for long, but I don't; instead I either trust sites completely or don't want to accept cookies from them at all. Maybe I'm missing out on some conveniences that SDC would give me by (temporarily) accepting more cookies, but so far I'm not seeing it.

My views on Ghostery haven't changed since last time. It seems especially pointless now that I'm using µBlock, although I may be jumping to assumptions here.

User interface (in a broad sense):

  • FireGestures. I remain absolutely addicted to controlling my browser with gestures and this works great.

    (Lack of good gestures support is the single largest reason I won't be using Chrome regularly any time soon (cf).)

  • It's All Text! handily deals with how browsers make bad editors. I use it a bunch these days, and in particular almost of my comments here on Wandering Thoughts are now written with it, even relatively short ones.

  • Open in Browser because most of the time I do not want to download a PDF or a text file or a whatever, I want to view it right then and there in the browser and then close the window to go on with something else. Downloading things is a pain in the rear, at least on Linux.

(I wrote more extensive commentary on these addons last time. I don't feel like copying it all from there and I have nothing much new to say.)

Miscellaneous:

  • HTTPS Everywhere basically because I feel like using HTTPS more. This sometimes degrades or breaks sites that I try to browse, but most of my browsing is not particularly important so I just close the window and go do something else (often something more productive).

  • CipherFox gives me access to some more information about TLS connections, although I'd like a little bit more (like whether or not a connection has perfect forward secrecy). Chrome gets this right even in the base browser, so I wish Firefox could copy them and basically be done.

Many of these addons like to plant buttons somewhere in your browser window. The only one of these that I tolerate is NoScript's, because I use that one reasonably often. Everyone else's button gets exiled to the additional dropdown menu where they work pretty fine on the rare occasions when I need them.

(I would put more addon buttons in the tab bar area if they weren't colourful. As it is, I find the bright buttons too distracting next to the native Firefox menu icons I put there.)

I've been running this combination of addons in Firefox Nightly sessions that are now old enough that I feel pretty confident that they don't leak memory. This is unlike any number of other addons and combinations that I've tried; something in my usage patterns seems to be really good at making Firefox extensions leak memory. This is one reason I'm so stuck on many of my choices and so reluctant to experiment with new addons.

(I would like to be able to use Greasemonkey and Stylish but both of them leak memory for me, or at least did the last time I bothered to test them.)

PS: Firefox Nightly has for some time been trying to get people to try out Electrolysis, their multi-process architecture. I don't use it, partly because any number of these extensions don't work with it and probably never will. You can apparently check the 'e10s' status of addons here; I see that NoScript is not e10s ready, for example, which completely rules out e10s for me. Hopefully Mozilla won't be stupid enough to eventually force e10s and thus break a bunch of these addons.

by cks at April 10, 2015 06:15 AM

April 09, 2015

Chris Siebenmann

Probably why Fedora puts their release version in package release numbers

Packaging schemes like RPM and Debian debs split full package names up into three components: the name, the (upstream) version, and the (distribution) release of the package. Back when people started making RPM packages, the release component tended to be just a number, giving you full names like liferea-1.0.9-1 (this is release 1 of Liferea 1.0.9). As I mentioned recently, the modern practice of Fedora release numbers has changed to include the distribution version. Today we have liferea-1.10.13-1.fc21 instead (on Fedora 21, as you can see). Looking at my Fedora systems, this appears to be basically universal.

Before I started writing this entry and really thinking about the problem, I thought there was a really good deep reason for this. However, now I think it's so that if you're maintaining the same version of a package on both Fedora 20 and Fedora 21, you can use the exact same .spec file. As an additional reason, it makes automated rebuilds of packages for (and in) new Fedora versions easier and work better for upgrades (in that someone upgrading Fedora versions will wind up with the new version's packages).

The simple magic is in the .spec file:

Release: 1%{?dist}

The RPM build process will substitute this in at build time with the Fedora version you're building on (or for), giving you release numbers like 1.fc20 and 1.fc21. Due to this substitution, any RPM .spec file that does releases this way can be automatically rebuilt on a new Fedora version without needing any .spec file changes (and you'll still get a new RPM version that will upgrade right, since RPM sees 1.fc21 as being more recent than 1.fc20).

The problem that this doesn't really deal with (and I initially thought it did) is wanting to build an update to the Fedora 20 version of a RPM without updating the Fedora 21 version. If you just increment the release number of the Fedora 20 version, you get 2.fc20 and the old 1.fc21 and then upgrades won't work right (you'll keep the 2.fc20 version of the RPM). You'd have to change the F20 version to a release number of, say, '1.fc20.1'; RPM will consider this bigger than 1.fc20 but smaller than 1.fc21, so everything works out.

(I suspect that the current Fedora answer here is 'don't try to do just a F20 rebuild; do a pointless F21 rebuild too, just don't push it as an update'. Really there aren't many situations where you'd need to do a rebuild without any changes in the source package, and if you change the source package, eg to add a new patch, you probably want to do a F21 update too. I wave my hands.)

PS: I also originally thought that Ubuntu does this too, but no; while Ubuntu embeds 'ubuntu' in a lot of their package release numbers, it's not specific to the Ubuntu version involved and any number of packages don't have it. I assume it marks packages where Ubuntu deviates from the upstream Debian package in some way, eg included patches and so on.

by cks at April 09, 2015 04:51 AM

April 08, 2015

Everything Sysadmin

Usenix LISA has changed a lot in the last 5-10 years

I received an interesting email recently:

Did the submissions process for LISA change in recent years? I recall going to submit a talk a couple years ago and being really put off by the requirements for talks to be accompanied by a long paper, and be completely original and not previously presented elsewhere. Now it seems more in line with other industry conferences.

Yes, LISA is very different than it was years ago. If you haven't attended LISA in a while, you may not realize how different it is!

The conference used to be focused on papers with a few select "invited talks". A few years ago, the conference changed its focus to be great talks. LISA still accepts "original research" papers, but they're just one track in a much larger conference and have a separate review process. In fact, the conference now publishes both a Call for Participation and a separate Call for Research Papers and Posters.

If LISA is now "talk-centric", what kind of talks does it look for? Quoting from the Call for Participation, "We invite industry leaders to propose topics that demonstrate the present and future state of IT operations. [Talks should] inspire and motivate attendees to take actions that will positively impact their business operations." LISA looks for a diverse mix of speakers, not just gender diversity, but newcomers and experienced speakers alike. We have special help for first time speakers, including assistant with rehearsals and other forms of mentoring.

What about the papers that LISA does publish? The papers have different criteria than talks. They should "describe new techniques, tools, theories, and inventions, and present case histories that extend our understanding of system and network administration." Starting in 2014, the papers have been evaluated by a separate sub-committee of people with academic and research backgrounds. This has had an interesting side-effect: the overall quality of the papers has improved and become more research/forward-looking.

Because LISA mixes industry talks and research papers, attendees get to hear about new ideas along before they become mainstream. Researchers benefit by having the opportunity to network and get feedback from actual practitioners of system administration. This gives LISA a special something you don't find anywhere else.

Another thing that makes LISA better is the "open access" policy. Posters, papers, and presentations are available online at no charge. This gives your work wider visibility, opening up the potential to have greater impact on our industry. Not all conferences do this, not even all non-profit conferences do this.

Does that make you more interested in submitting a proposal?

We hope it does!

All proposal submissions are due by April 17, 2015.

  • Tom Limoncelli and Matt Simmons
  • (volunteer content-recruiters for LISA '15)

P.S. LISA has a new mission statement: LISA is the premier conference for IT operations, where systems engineers, operations professionals, and academic researchers share real-world knowledge about designing, building, and maintaining the critical systems of our interconnected world.

April 08, 2015 05:00 PM