Planet SysAdmin


March 11, 2010

High Scalability

Saying Yes to NoSQL; Going Steady with Cassandra at Digg

The last six months have been exciting for Digg's engineering team. We're working on a soup-to-nuts rewrite. Not only are we rewriting all our application code, but we're also rolling out a new client and server architecture. And if that doesn't sound like a big enough challenge, we're replacing most of our infrastructure components and moving away from LAMP.

Perhaps our most significant infrastructure change is abandoning MySQL in favor of a NoSQL alternative. To someone like me who's been building systems almost exclusively on relational databases for almost 20 years, this feels like a bold move.

What's Wrong with MySQL?

Our primary motivation for moving away from MySQL is the increasing difficulty of building a high performance, write intensive, application on a data set that is growing quickly, with no end in sight. This growth has forced us into horizontal and vertical partitioning strategies that have eliminated most of the value of a relational database, while still incurring all the overhead.

Relational database technology can be a blunt instrument and we're motivated to find a tool that matches our specific needs closely. Our domain area, news, doesn't exact strict consistency requirements, so (according to Brewer's theorem) relaxing this allows gains in availability and partition tolerance (i.e. operations completing, even in degraded system states). We're confident that our engineers can implement application level consistency controls much more efficiently than MySQL does generically.

As our system grows, it's important for us to span multiple data centers for redundancy and network performance and to add capacity or replace failed nodes with no downtime. We plan to continue using commodity hardware, and to continue assuming that it will fail regularly. All of this is increasingly difficult with MySQL.

 

by John Quinn at March 11, 2010 12:42 AM

March 10, 2010

Security Monkey

Guest Case File Alert: How A Laptop Bankrupted a Millionaire

Hi everyone, I know you're all frothing at the mouth for the final installments of the Tenacious Timelord casefile - however, it's still not ready. I really do apologize, but I'm up to my ears in work right now. To hold you over and to further my desire to entertain and educate, I wanted to introduce you to the work of Drew Janssen over at Drive Rescue, Inc. Drew does some impressive forensic investigations and has been documenting them as single-

March 10, 2010 05:27 PM

The Tech Teapot

Automated install comes to open source .NET projects

One of the nice things about Linux is the ability to install apps (and dependencies) very easily using apt-get or similar. Windows users have been missing a similar tool for a long time. Never fear, the Scottish Alt.Net group have written Hornget, a tool for installing open source .NET projects.

Quite a few projects are supported, though most are of interest only to programmers. It would be nice to see a lot more user oriented tools like games and the like.


by Jack Hughes at March 10, 2010 04:49 PM

Standalone Sysadmin

The woes of a small infrastructure admin…

Before I start, I just want you to know that I’m not whining, I just thought I’d give this as an example of some of the things that people who run small infrastructures are left out of…

Today I’m sitting in the office in NJ, doing work as normal. What I’d prefer to be doing is going to the IT Roadmap Conference & Expo in NYC. According to the website, it’s “designed for IT professionals who want to cover multiple industry topics in one day”. That sounds like something I’d be interested in!

Essentially, it’s a sales pitch, or a series of sales pitches. I don’t know if I’m in the market for what they’re selling, but I’d like to go find out what is being offered. All the same, I like to keep my eyes on the horizon, because things have a habit of coming up quick on us in IT, and if we don’t familiarize ourselves with the likely technology of the next few years, then we’ll be caught with our pants down. So I wanted to see what people were selling.

The conference is free. All you have to do is fill out the application for registration. Unfortunately, I don’t qualify:

Dear Matt,

Thank you for your interest in Network World Live’s IT Roadmap Conference & Expo in New York.

Unfortunately, after reviewing the information that you submitted, we determined that at this time, we are not able to confirm your seat on a complimentary basis.

As we noted on the registration form, this event is geared towards network and IT professionals in end-user type companies who actively purchase products and services – or – who will be doing so in the near future. We have a limited number of complimentary seats reserved for attendees who meet this criteria.

snip

Walk-ins or ineligible applicants arriving at the conference facility will NOT be admitted on the day of the event.
Thank you,

IT Roadmap Team
Network World Events & Executive Forums

(emphasis theirs)

Well, I do actively purchase technologies and products, but not at the scale that they’re looking for, I suppose. I don’t have 50 data centers, or “20,000 or more” servers, so I don’t get to go to their party and look at the toys.

It’s unfortunate for them and me, but somehow I think I’ll live. I just wanted to give you a tangible example of…well…I won’t go so far as to say discrimination, but maybe exclusion, that we small admins deal with from vendors.


by Matt Simmons at March 10, 2010 02:53 PM

High Scalability

How FarmVille Scales - The Follow-up

Several readers had follow-up questions in response to How FarmVille Scales to Harvest 75 Million Players a Month. Here are Luke's response to those questions (and a few of mine).

How does social networking makes things easier or harder?

The primary interesting aspect of social networking games is how you wind up with a graph of connected users who need to be access each other's data on a frequent basis. This makes the overall dataset difficult if not impossible to partition.

What are examples of the Facebook calls you try to avoid and how they impact game play?

We can make a call for facebook friend data to retrieve information about your friends playing the game. Normally, we show a friend ladder at the bottom of the game that shows friend information, including name and facebook photo. 

Can you say where your cache is, what form it takes, and how much cached there is? Do you have a peering relationship with Facebook, as one might expect at that bandwidth?

by Todd Hoff at March 10, 2010 02:53 PM

TechRepublic Network Administrator

The safety of numbers: Implementing filters on outgoing email

Derek Schauland recently ran into a support situation involving the filtering of outbound email messages. How much emphasis does your organization put on outbound filtering?

—————————————————————————————-

I had an interesting email issue recently at the office. A co-worker was trying to send an email with some attached financial information to an auditor in preparation for upcoming meetings. The information in the attachment was returned by our spam filter because it had a string of numbers in it that matched a pattern similar to that of a Social Security Number.

This problem, of course, was hard to explain to the user, who was just trying to do her job. To a user, it seems odd that we would care about patterns of numbers that weren’t actually social security numbers. However, the automation and regular expression editor at Postini doesn’t know if a number is or isn’t a SSN — it’s simply looking for that pattern.

Reasons to watch outgoing information

Until recently it hadn’t occurred to me that we might wish to look for things like this. Who would want to email their social security number anywhere? With all of the talk of identity theft going around, it seemed to be a no brainer to me; however, when Postini introduced an easy way to ensure the safety of these things, I was quick to turn it on.

I feel responsible for the Internet well-being of my users, and it’s possible that there are some who aren’t properly aware of the risks of transmitting their own personal information via email, even for legitimate purposes. But the real threat for many companies, isn’t that their employees are clueless about handling sensitive data, but that there are those who might purposefully try to steal and transmit personally identifying information for their own profit.

Insiders might try to steal credit card numbers, account numbers, or even Social Security Numbers. Preventing these items from getting emailed out by simply configuring a filter to catch them is a first step in security. Obviously, this measure would only prevent the dumbest criminals from trying to steal from the company, but it’s a start.

When this new feature was added to the filter, it triggered a discussion about what should be done. When my reasons were given for adding these filters, the consensus was that this was a good idea, if for no other reason than to keep an accidental email from being sent out that contains a customer credit card number or other sensitive information.

Bottom line

Filtering inbound email for spam is a given to help keep employees productive, but helping to keep data secure by preventing certain types of information from being sent out is also a good practice. Taking the time to filter outbound mail and flagging potentially damaging contents might just be the thing that keeps an organization out of the courts.

Does your organization filter outbound email for any reason? How do you do it? Have you run into any problems?



by Derek Schauland at March 10, 2010 02:00 PM

TaoSecurity

Bejtlich OWASP Podcast Posted

My appearance on OWASP Podcast 61 is available.

The .mp3 is 36 MB. Thanks to Jim Manico for inviting me to participate.

We recorded the podcast in late January. Jim asked me the following questions:
  1. Would you care to tell us how did you get into IT and what lead you into a career in information security? What keeps you busy these days?
  2. What's the difference between focusing on threats vs focusing on vulnerabilities?
  3. What is your problem with the "protect the data" mindset?
  4. What do you mean by "building visibility in"?
  5. What is your take on the Aurora/Google hack?
  6. You just tweeted that "Network Security Monitoring ideology is the proper mechanism to combat APT/APA". Do you think network IPS/IDS/WAF can help defend insecure web applications? What are the limits of Network Security Monitoring?
  7. How important a role do you think secure coding and secure software development life-cycle play in defending the enterprise?
  8. Have HIPAA, PCI, SOX and other regulations helped reduce risk in the average enterprise?
  9. Is seems pretty clear that attackers have a clear advantage. Why is that? How can we turn the tide?
  10. Any thoughts on OWASP? Are we helping the cause?
  11. Where are we going to be as an industry in 10 years?
  12. You blogged that "The trustworthiness of a digital asset is limited by the owner's capability to detect incidents compromising the integrity of that asset." Given that we don't have any high integrity database, identities or application servers - how do you detect a breach of integrity when there is no verifiable integrity in the system in the first place?

by Richard Bejtlich (noreply@blogger.com) at March 10, 2010 10:01 AM

Chris Siebenmann

Mythology about Unix workstations

Mythology about Unix workstations

Talking of Unix workstations, there's some mythology about them that seems to go around, or at least that may be going around and I feel like preemptively shooting down.

First, people who think that 1990s era Unix workstations were marvels of performance and features that have yet to be surpassed either have a very selective memory, were using very high end hardware from SGI, or never really used those workstations. I have used everything from Sun 3/50s onwards, and I can assure you that a modern PC that costs $500 smokes each and every one in terms of speed and features.

In fact, as I alluded to in passing in my original entry, old workstation hardware was actually rather terrible. It was not bad for the time (sometimes it was quite good), but it was not very good on an absolute scale and it was tolerable only because the software was equally limited so as not to exceed the hardware's capabilities. Let us not idolize the old days lest we be forced to live in them again, thanks.

The other piece of mythology is the idea that Unix workstation hardware was at least a marvel of niceness and good design compared to the hodgepodge and hacks of the current PC architecture. I am pretty sure that this was historically false; I certainly remember a whole stream of Usenix papers about what could basically be called 'the secret life of your hardware', where a number of kernel hackers wrote up bitter descriptions of exactly how bad various pieces of hardware were, such as Ethernet driver chipsets. Graphics were not exempt from this; for example, at the start of the 1990s, some DEC people wrote an entire paen about the advantages of an extremely simple framebuffer because its 2D performance beat the heck out of most of the then-current more complex graphics chipsets.

(Before you snort in disbelief at this, note that it was an 8-bit framebuffer. That was considered mainline or even advanced at the start of the 1990s, since at least you got 256 colours.)

I don't think that this should surprise anyone. People make design mistakes at the start of anything, because it takes time for them to figure what really works and what just looks good on paper, and the Unix workstation era happened in the early times of people making (commodity) chipsets for most of the hardware capabilities that we now take for granted.

(The less said about various workstation vendor predecessors to SCSI the better, especially in the server space. I still remember our early 1990s decision to pass over this new, low-performing 'SCSI' stuff in favour of an advanced, fast IPI disk interface on our new Sun 4 server. This being a university, that server stayed in production long enough for our laughter to become rather hollow.)

by cks at March 10, 2010 07:57 AM

Slaptijack

SysAdmin1138

Tragic password policies

I just completed an order with Newegg for some personal computing equipment. That part was OK. What wasn't OK was the "Verified by Visa" thingy that popped up during the ordering process. My primary credit cards aren't Visa so I haven't seen that yet, despite shopping on sites with the verified by Visa logo on 'em. Since I hadn't used it before I had to set the durned thing up. Which meant picking a password.

My jaw dropped.

6-8 characters is stated in the 'password policy' that was posted. And no matter what I threw at it, if I used my shift key it wouldn't take the password. I don't know about you, but complex password policies have been around long enough that my fingers automatically go for the shift key when entering passwords. NOT using it took mental effort. In fact, the password I ended up with is markedly less secure than the one I use for throw-away accounts on web-sites I don't care about.

That is not a way to run a bank.

I don't know what "Verified by Visa" really provides, but whatever it is, password security isn't it.

by SysAdmin1138 at March 10, 2010 02:27 AM

March 09, 2010

Steve Kemp's Blog

He's so mean he wouldn't light your pipe if his house was on fire.

By the time this blog entry goes live I'll be running upon my new machine. The migration process was mostly straightfoward and followed my plan:

  • Using my existing desktop system as a PXE server to install Lenny over the network.
  • Copied over important directories.
  • Restored backups.
  • Turned off old machine.

Of course it wasn't that simple in practise, as previously mentioned the whole reason I was looking for a new machine was because the software RAID upon my old desktop was failing - One of the two drives was completely dead.

As I'd feared the second drive failed partway through my migration. But thankfully I'd copied off the important stuff before then, and the backups I have off-site mostly covered everything else. (The things I lost were things I can find again such as ~/Music, ~/Videos. On the one hand they're too large to backup, on the other hand I should probably do it next time as they never change.)

Unfortunately the version of X in Lenny refused to work with the GeForce G210 video card I had. To be more correct using the Vesa driver I could get a picture and a smooth desktop, but when watching videos with xine I got maybe two frames a second. Both the open nv driver and the closed nvidia driver failed to support the card - so I swapped hardware, and I'm now running with the GeForce 7300 GS card from my previous desktop. This allows me to watch videos at full-screen with no issues. (Desktop size is 1600x1200 FWIW).

So now it's just a matter of tweaking the system. I've installed enough to be useful:

  • miredo - So I have IPv6 connectivity despite Virgion.
  • squid - So that I have a decent cache for surfing.
  • pdnsd - So I have a caching nameserver and am not at the whim of Virgin.
  • kvm - So I can setup scratch machines for play.

I've still got to setup pbuilder, but that'll be done shortly, and I've installed backported packages such that I can watch youtube videos. I'm currently running firefox from lenny but I expect that will change soon enough - not least because that version fails to support "adblockplus", only "adblock".

Two partitions md0 for /boot and md1 used as LVM, from which I've taken /, /home, etc:

Filesystem                      Size    Used    Avail Use% Mounted on
/dev/mapper/birthday--vol-root   9.9G     2.8G   6.6G  30% /
/dev/mapper/birthday--vol-home   22G      4.3G  16G    22% /home
/dev/mapper/birthday--vol-music  127G    43G    78G    36% /mnt/music
/dev/md0                         988M    38M    901M    4% /boot
/dev/mapper/birthday--vol-kvm    22G      8.8G  12G    44% /mnt/kvm
/dev/sdg1                        163G    143G   12G    93% /media/disk
skx@birthday:~/hg/blog/data$

 

skx@birthday:~/hg/blog/data$ sudo pvs
[sudo] password for skx:
  PV         VG           Fmt  Attr PSize   PFree
  /dev/md1   birthday-vol lvm2 a-   464.82G 274.51G

Update: Three irritations with this machine:

  1. As supplied the BIOS was set with "USB Mouse" and "USB Keyboard" set to "disabled". I had to beg the loan of a keyboard from a neighbour.
  2. As supplied the BIOS had virtualisation set to "disabled". Not a huge shock, but it caught me out regardless.
  3. As supplied the system had only a single SATA power connector. Annoying given that the motherboard is advertised as having "onboard RAID" and I'd purchased it with two hard drives. Happily I had a spare adaptor to hand.

I'd still recommend Novatech, but the last point had me swearing for a few minutes until I realised I did have a spare adaptor in the house.

ObFilm: Chitty Chitty Bang Bang

March 09, 2010 09:22 PM

Slaptijack

Gmail Search Power

Gmail logoSomehow, I have let my Gmail inbox get totally out of control. I don’t know if it was the holidays or what, but as of this morning, I had over 800 unread messages in my Gmail inbox. I decided a little while ago that it was time to focus on getting the situation resolved. Gmail’s excellent search capabilities will help me solve this problem in a snap. What I’m going to do is use Gmail’s search to create a list of unread messages in my inbox and then work through them as quickly as possible. By the way, keyboard shortcuts make this go a lot quicker!

I use labels extensively, and about 100 of these messages are labeled as either “security” or “cisco.” The emails in these labels need to be reviewed closely, so I’m going to go ahead and exclude them from this search. Therefore, the search below will find all unread messages in my inbox, unless they are labeled as “security” or “cisco.”

is:unread in:inbox -{label:security label:cisco}

This should be entered in the Gmail search box. The only really tricky things here are the curly braces ({}) and minus sign (-). The curly braces treat everything inside as if it was separated by a logical OR. In this case, it matches any message that is labeled “security” or “cisco.” WIthout the curly braces, a message would have to be in both labels to match. The minus sign lets Gmail know that I don’t want to see the results of that match in the final search results.

by Scott Hebert at March 09, 2010 08:39 PM

TechRepublic Network Administrator

Cisco unveils new router to drive video: Big expectations ahead

This is a guest post from Sam Diaz, Senior Editor at ZDNet, TechRepublic’s sister site. You can follow Sam on his ZDNet blog Between the Lines (or subscribe to the RSS feed).

—————————————————————————————-

Cisco made a significant announcement today in its effort to revamp the Internet as we know, launching a new networking router that has the power and the capacity to handle the demands of the next generation Internet.

The product is the Cisco CRS-3 Carrier Routing System, which is designed to be the “foundation” of the next-generation Internet, one that can set the pace for video growth. The device promises to more than 12 times the traffic capacity of the closest competing system, with up to 322 terabits per second. How fast is that? The company said it enables the entire printed collection of the Library of Congress to be downloaded in just over one second.

OK, that’s fast - but why do we need this sort of speed and capacity?

This is less about the Internet that connects Web surfers; This is about the Internet’s backbone - a beefed-up pipeline that exceeds the sort of power that we actually need today, but prepares us for the growth that will come from Internet as it relates to video and advanced communications.

The company also said that AT&T recently tested the CRS-3 in a successful completion of a field trial of 100-Gigabit backbone network technology. The CRS-3 is currently in other field trials.

In a Webcast announcement, CEO John Chambers talked about how changing needs drive this demand. This is about meeting the needs of a future generation of users, today’s kids who already see video and communications as part of our connected lives,

This is also about verticals such as health care or education or government and their needs to not only connect to each other for enhanced communications but with their customers, as well. On a business front, this is about the technology that will change everything from virtualization to collaboration.

John Chambers says this is a step in Cisco moving away from being just the plumber of the Internet to being a business partner, and adviser on how to bring new life to new technologies. Chambers has long said that the network is at the core of the Internet.

What was funny was that Chambers acknowledged this announcement wont turn many heads among consumer-level Internet users. It’s boring backbone stuff for consumers - but it sets the stage for the Internet experience for the future.

As for Cisco’s attempt at drawing attention to this news, the company seemed to take a page from Apple’s playbook - and I don’t know that it was that effective. Cisco issued a press release yesterday, inviting the tech press to an online event for an announcement that would “change the Internet forever.”

This was a bit stiff, though - executives sitting around a table with another exec on the big screen, via Cisco’s telepresence technology. In some ways, this is the press conference of the future - a “casual” setting where executives sit around and talk about how evolutionary its new offering is while we all tune in.

Just by the announcement itself, this hype was a bit more than what was delivered. Sure, the news is important to the changing role of the Internet - but it’s no iPad announcement. And once we picked up the meat of the news, there really wasn’t much more there - it wasn’t like Chambers plopped down in a bug comfy leather chair to demo the technology the way Steve Jobs might do.



by Sam Diaz at March 09, 2010 07:09 PM

:wq

Context searching using Clojure-OpenNLP

This is an addon to my previous post, “Natural Language Processing in Clojure with clojure-opennlp“. If you’re unfamiliar with NLP or the clojure-opennlp library, please read the previous post first.

In that post, I told you I was going to use clojure-opennlp to enhance a searching engine, so, let’s start with the problem!

The Problem

(Ideas first, code coming later. Be patient.)

So, let’s say you have a sentence:

“The override system is meant to deactivate the accelerator when the brake pedal is pressed.”

I took this sentence out of a NYTimes article talking about the recent Toyota recalls, just pretend it’s an interesting sentence.

Let’s say that you wanted to search for things containing the word “brake“; well, that’s easy for a computer to do right? But, let’s say you want to search for things around the word “brake“, wouldn’t that make the search better? You would be able to find a lot more articles/pages/books/whatever you might be interested in, instead of ones that only contained the word key word.

So, in a naïve (for a computer) pass for words around the key word, we can come up with this:

“The override system is meant to deactivate the accelerator when the brake pedal is pressed.”

Right. Well. That really isn’t helpful, it’s not like the words “when”, “the” or “is” are going to help us find other things related to this topic. We got lucky with “pedal”, that will definitely help find things that are interesting, but might not actually have the word brake in them.

What we really need, is something that can pick words out of the sentence that we can also search for. Almost any human could do this trivially, so here’s what I’d probably pick:

“The override system is meant to deactivate the accelerator when the brake pedal is pressed.”

See the words in red? Those are the important terms I’d probably search for if I were trying to find more information about this topic. Notice anything special about them? Turns out, they’re all nouns or verbs. This is where the NLP (Natural Language Processing) comes into play for this, given a sentence like the above, we can categorize it into it’s parts by doing what’s called POS-Tagging (POS == Parts Of Speech):

["The" "DT"] ["override" "NN"] ["system" "NN"] ["is" "VBZ"] ["meant" "VBN"] ["to" "TO"] ["deactivate" "VB"] ["the" "DT"] ["accelerator" "NN"] ["when" "WRB"] ["the" "DT"] ["brake" "NN"] ["pedal" "NN"] ["is" "VBZ"] ["pressed" "VBN"] ["." "."]

So next to each word, we can see what part of speech that word belongs to. Things starting with “NN” are nouns, things starting with “VB” are verbs. Doing this is non-trivial. (A full list of tags can be found here). That’s why there are software libraries written by people smarter than me for doing this sort of thing (*cough* opennlp *cough*). I’m just writing the Clojure wrappers for the library.

Anyway, back to the problem, so what I need to do is strip out all the non-noun and non-verb words in the sentence, that leaves us with this:

["override" "NN"] ["system" "NN"] ["is" "VBZ"] ["meant" "VBN"] ["deactivate" "VB"] ["accelerator" "NN"] ["brake" "NN"] ["pedal" "NN"] ["is" "VBZ"] ["pressed" "VBN"]

We’re getting closer, right? Now, searching for things like “is” probably won’t help, so let’s strip out all the words with less than 3 characters:

["override" "NN"] ["system" "NN"] ["meant" "VBN"] ["deactivate" "VB"] ["accelerator" "NN"] ["brake" "NN"] ["pedal" "NN"] ["pressed" "VBN"]

Now we get to a point where we have to make some kind of decision about what terms to use, for this project, I decided to weight the terms that were nearer to the original search term, only taking two of the closest words in each direction, after scoring this sentence you get:

["override" 0] ["system" 0] ["meant" 0] ["deactivate" 0.25] ["accelerator" 0.5] ["brake" 1] ["pedal" 0.5] ["pressed" 0.25]

The red next to each word indicates how heavily we’ll weight each word when we use it for subsequent searches. The score is divided by 2 for each unit of distance away from the original key term. Not perfect, but it works pretty well.

That was pretty easy to follow, right? Now imagine having a large body of text, and a term. First you’d generate a list of key words from sentences directly containing the term, score them each and store them in a big map. On a second pass, you can start scoring each sentence by using the map of words => scores you’ve already built. In this way, you can then rank sentences, including those that don’t even contain the actual word you’ve searched for, but are still relevant to the original term. Now, my friend, you have context searching.

Congratulations, that was a really long explanation. It ended up being a bit more like pseudocode, right? (or at least an idea of how to construct a program). Hopefully after understand the previous explanation, the code should be easy to read.

The Code

(note: this is very rough code, I know it’s not entirely idiomatic and has a ton of room for enhancement and parallelization, feel free to suggest improvements in the comments however!)

The most important functions to check out are (score-words [term words]) method, which returns a list of vectors of words and their score, and the (get-scored-terms [text term]) method, which returns a map of words as keys and scores as values for the entire text, given the initial term.

Here’s the output from the last few lines:

contextfinder=> (pprint (reverse (sort-by second (score-sentences mytext scorewords))))
(["The override system is meant to deactivate the accelerator when the brake pedal is pressed. " 13/4]
["The Obama administration is considering requiring all automobiles to contain a brake override system intended to prevent sudden acceleration episodes like those that have led to the recall of millions of Toyotas, the Transportation secretary, Ray LaHood, said Tuesday. " 5/2]
["Often called a \"smart pedal,\" the feature is already found on many automobiles sold worldwide, including models from BMW, Chrysler, Mercedes-Benz, Nissan and Volkswagen." 3/4]
["That will let the driver stop safely even if the cars throttle sticks open. " 0])

I highlighted the score for each sentence in red, see how sentences such as the third in the list have a score, but don’t contain the word “brake”? Those would have been missed entirely without this kind of searching.

contextfinder=> (println (score-text mytext scorewords))
13/2

Score a whole block of text, self-explanatory.

Anyway, I’ve already spent hours writing the explanation for why you’d want to do this sort of searching, so I think I’ll save the explanation of the code for another post. Hopefully it’s clear enough to stand on its own. If you find this interesting, I encourage you to check out the clojure-opennlp bindings and start building other cool linguistic tools!

by Lee at March 09, 2010 06:31 PM

cmdln.org

Sysstat Sar Performance Metrics Via Nagios Plugin

I know I’ve mentioned how much I love the sysstat package before. I use sar regularly to help with performance diagnostics (Analyzing Linux System Performance And Finding Bottle Necks, CPU Performance Analysis In Linux, Baseline Analysis Is Important, CPU Performance Analysis In Linux Revisited). I wrote this little Nagios plugin to collect the performance metrics that sar [...]

by Nick Anderson at March 09, 2010 03:42 PM

High Scalability

Sponsored Post: Job Openings - Squarespace

Squarespace Looking for Full-time Scaling Expert

Interested in helping a cutting-edge, high-growth startup scale? Squarespace, which was profiled here last year in Squarespace Architecture - A Grid Handles Hundreds of Millions of Requests a Month and also hosts this blog, is currently in the market for a crack scalability engineer to help build out its cloud infrastructure. Squarespace is very excited about finding a full-time scaling expert.

Interested applicants should go to http://www.squarespace.com/jobs-software-engineer for more information.



If you would like to advertise your critical, hard to fill job opeinings on HighScalability, please contact us and we'll get it setup for you.

by Todd Hoff at March 09, 2010 03:16 PM

Applications as Virtual States

This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud.

As I was writing an article on the architecture of the Storm Botnet, I couldn't help but notice the deep similarity of how Storm works and changes we're seeing in the evolution of political systems. In particular, the rise of the virtual-state. As crazy as this may sound, I think this is also the direction applications will need follow to survive in a complex world of billions of compute devices.

You may have already heard of virtual corporations. Virtual corporations are companies with limited office space, a distributed workforce, and production facilities located wherever it is profitable to locate them. The idea is to stay lean and compete using the rapid development and introduction of new products into high value-added markets. If you spot a market opportunity with a small time window, building your own factories and hiring and engineering team simply isn't an option. Building factories is a bit old fashioned and is left to the select few. These days you get an idea for a product and contract out everything else you possibly can. It doesn't really matter where you are located or where any of your partners are located. If part of your product requires a specialized microprocessor, for example, you'll contract out the R&D and the design. The manufacture will be contracted out to a virtual fab, then the chip will be sent to a contract manufacturing service for integration. Look ma, no hands.

Futurists say land doesn't matter anymore. Nations don't matter anymore. Entire relationships are abstractly represented by flows of money, contracts, information, and products between all these different agents. Interestingly enough, what technology is the absolute master of managing flows? Applications! But we are getting ahead of ourselves here.

by Todd Hoff at March 09, 2010 03:07 PM

TechRepublic Network Administrator

Ethernet over power lines: Netgear makes major improvements

Using in-house electrical wiring for networking computers usually only happens if wires can’t be run or Wi-Fi connections are less than adequate. Netgear’s new Ethernet over power-line devices may change that.

————————————————————————————————–

Last year, I went through what I consider the perfect storm of network cabling. To explain, I was asked by a client to set up an Ethernet network at one of his rental facilities. For some reason, I could not run cables. To make matters worse, there was an inordinate amount of grounded metal (galvanized studs) acting like RF sponges. That eliminated Wi-Fi gear as an option.

Out of options, I tried Netgear’s Powerline equipment and was disappointed. Bandwidth never came close to what Netgear advertised. Still, the client was not deterred by the limited throughput. They were happy to have anything at all. I guess sneaker networks get old fast.

New and improved

Previously, Netgear offered two product lines, one rated at 85 Mb per second and one at 200 Mb per second. Both were lucky to achieve half that throughput. Still, Netgear seems determined to make Power line Ethernet a viable solution. They just announced a new product line that may take care of the bandwidth problem. The new adapters have the following enhancements :

  • Throughput speed of 500 Mb per second.
  • Prioritized Quality of Service (QoS), important for streaming media applications.
  • Simple 128-Bit AES encryption, using the “Push-and-secure” button.
  • Backward compatible with other Netgear Powerline products and equipment from other vendors, if it’s HomePlug AV certified.

Two models

Netgear is offering two models, the Powerline AV 500 Adapter Kit/XAVB5001 (courtesy of Netgear):

As well as the Powerline AV+ 500 Adapter Kit/XAVB5501 (courtesy of Netgear):

Netgear mentions that the devices are designed to leave the second socket of an outlet pair open for use. Also, the XAVB5501 provides a filtered power socket, if outlets are in short supply. Each kit comes with two adapters.

Final thoughts

I just ran an Ethernet cable about 20 meters to get network access to our main HDTV in the living room. I first tried using a Wi-Fi link. But it required a repeater, which cut throughput enough to cause buffering. I’m thinking the new Powerline adapters would have saved a lot of work.




by Michael Kassner at March 09, 2010 02:00 PM

TechRepublic IT Security

Microsoft warns: Don't press F1

As of the first of March, Microsoft had released a security advisory related to the use of the F1 key while using its Internet Explorer browser.


With any luck, millions of Microsoft Windows computers should get a patch this Patch Tuesday for a VBScript vulnerability that could allow a remote attacker to take over the computer. So far, it seems that there are no exploits in the wild, as noted in Microsoft’s security advisory:

Microsoft is investigating new public reports of a vulnerability in VBScript that is exposed on supported versions of Microsoft Windows 2000, Windows XP, and Windows Server 2003 through the use of Internet Explorer. Our investigation has shown that the vulnerability cannot be exploited on Windows 7, Windows Server 2008 R2, Windows Vista, or Windows Server 2008. The main impact of the vulnerability is remote code execution. We are not aware of attacks that try to use the reported vulnerabilities or of customer impact at this time.

Microsoft says that the nature of the vulnerability is tied to “the way VBScript interacts with Windows Help Files when using Internet Explorer.” Unless and until that vulnerability gets patched, the workaround to protect yourself is simply to avoid using the F1 key:

If a malicious Web site displayed a specially crafted dialog box and a user pressed the F1 key, arbitrary code could be executed in the security context of the currently logged-on user.

TechRepublic contributing writer Sterling Camden’s take on the issue offers vivid illustration of the problem:

I can imagine Grandma sitting in front of a page that says, “Your computer’s LHC has encountered fatal hard drive saturation. Press F1 for more information.”

Microsoft identifies the following as affected software:

  • Microsoft Windows 2000 Service Pack 4
  • Windows XP Service Pack 2, Windows XP Service Pack 3, and Windows XP Professional x64 Edition Service Pack 2
  • Windows Server 2003 Service Pack 2, Windows Server 2003 with SP2 for Itanium-based Systems, and Windows Server 2003 x64 Edition Service Pack 2

A very vague reassurance is offered for MS Windows Server 2003 users as well:

On systems running Windows Server 2003, Internet Explorer Enhanced Security Configuration is enabled by default, which helps to mitigate against this issue.

Further information is available at the CVE advisory page for this vulnerability:

VBScript in Microsoft Windows 2000 SP4, XP SP2 and SP3, and Server 2003 SP2, when Internet Explorer is used, allows user-assisted remote attackers to execute arbitrary code by referencing a (1) local pathname, (2) UNC share pathname, or (3) WebDAV server with a crafted .hlp file in the fourth argument (aka helpfile argument) to the MsgBox function, leading to code execution involving winhlp32.exe when the F1 key is pressed.

Remedial vulnerability handling

It should come as no surprise that Microsoft continues to stick to its “responsible disclosure” guns on this matter. It has not failed to use its security advisory as a platform for trying to chastise security researchers:

Microsoft is concerned that this new report of a vulnerability was not responsibly disclosed, potentially putting computer users at risk. We continue to encourage responsible disclosure of vulnerabilities. We believe the commonly accepted practice of reporting vulnerabilities directly to a vendor serves everyone’s best interests. This practice helps to ensure that customers receive comprehensive, high-quality updates for security vulnerabilities without exposure to malicious attackers while the update is being developed.

While this may seem perfectly reasonable at first glance, it is easy to read between the lines and see what Microsoft really wants from security researchers — silence. As pointed out in “How should we handle security notifications? ” there is a strong argument for sharing as much information as possible with end users whenever a new vulnerability is discovered, so that they may employ workarounds to ensure they do not suffer the ill effects of using vulnerable software in an unsafe manner. Microsoft’s track record is one of attempting to punish any security researchers who do so, preferring researchers to inform nobody outside of Microsoft itself, then sit down and shut up.

By browsing through security advisory archives for researchers who abide by “responsible disclosure” as defined by corporations like Microsoft, one gets a pretty clear view of the end result of such a policy. Archives such as those of eEye Digital Security show that when nobody outside of such a security researcher and Microsoft employees know anything about a given vulnerability, it is all too common that the vulnerability may go eighteen months or longer without getting patched. The most common lengths for the period between an eEye report date and a Microsoft patch date is more than 100 days — about three and a half months or more.

As of this writing, in fact, the most recent eEye vulnerability discovery was a remote code execution issue rated as High severity, and it took Microsoft 107 days to get around to fixing it after being notified by eEye Digital Security. To many security professionals, this kind of casual delay is regarded as an almost criminal shirking of responsibility for software security.

Protect yourself

In the long run, the application of market forces as encouragement for Microsoft to change its ways, or for some competitor with a more conscientious approach to dealing with security vulnerabilities, is really the only way to solve the problem of such laxity. For all its talk about improving security policies, procedures, and design in recent years, Microsoft clearly has a long way to go before it can actually be regarded as a good example of a software vendor that handles security in a competent and ethical manner. It is only when its customers actually know about a vulnerability that the corporate software giant — like most other large software vendors — can be moved to swift action.

In the short term, however, we have been granted a piece of helpful information about the vulnerability of MS Windows via Internet Explorer, and should make use of it. The smallest adjustment to normal IE use to protect yourself would involve simply refusing to press the F1 button while using that Web browser. A more significant (and potentially more effective) adjustment for many would be to simply avoid using Internet Explorer at all, and choose some other browser in its stead.

For those of you using another operating system entirely, or a sufficiently new release version of MS Windows, this security advisory is probably not relevant.



by Chad Perrin at March 09, 2010 02:00 PM

SysAdmin1138

They've got a point

Yesterday on El Reg was a nice article about the sorry state of the stand-alone mail client. WebMail has captured what little email people do while not at work, and the in-application messaging features of certain large social networking sites is supplying most of the rest of the private asynchronous chat messaging people are doing. And yes, I'm seeing a lot less non mail-list traffic in my private mailboxes than I was 10 years ago (of course, 10 years ago I was also still on Usenet. For the articles. Really!). Of the messages that aren't list-traffic, the rest are the usual assortment of semi-legit come-ons and a very large percentage of status update type messages from various social networking sites.

Anyway, stand-alone email is not getting the developer attention it once was. The Register article pointed out on page 2 that Opera has a surprisingly good mail client hiding in it. And they're right, it's pretty darned good. I'm using it at home in preference to Thunderbird even. I keep Thunderbird around for those exceedingly rare cases when I need either GPG or S/MIME for something, a feature Opera hasn't gotten around to dealing with yet and probably never will. But for simple email management, the mail client in Opera really is quite good.

by SysAdmin1138 at March 09, 2010 07:31 AM

Chris Siebenmann

How not to design an API (in C): the enum ordering mistake

How not to design an API (in C): the enum ordering mistake

Suppose that you are creating an API in C and that you have a return value that is just right for an enum; for example, it communicates either 'all is okay' or some range of errors and exceptional conditions. Here's how not to write this API:

typedef enum { ERROR_1, ERROR_2, ERROR_3, ALL_OK } error_t;

You don't want to do this, because sooner or later you're going to want to add another error condition, ERROR_4, and the end result of putting it after ALL_OK is going to look somewhere between ugly and stupid.

The rule of thumb with enums and similar objects is that the fixed point goes at the start of the range. You are unlikely to have more than one 'all is fine' return code, so it is the fixed point and goes at the start.

The extra special way not to design this API is to do this and then just put ERROR_4 where it belongs, ie before ALL_OK. If you do this, any number of people will throttle you because you have just destroyed binary compatibility by renumbering ALL_OK's actual value. Worse, the broken binary compatibility may be subtle, depending on where and how people use the enum, since only one value has shifted.

(Admittedly this is only an issue in C and similar compiled languages that turn enums into actual integers behind the scenes. In other languages, this confusion can't happen; either ALL_OK is silently renumbered in all code that's using it or ALL_OK is purely a symbol with no numeric value attached to it as such.)

You would think that people wouldn't do this. Sadly, I have just seen this mistake made in software from a major vendor, assuming that it was a mistake instead of a deliberate decision to subtly punish people who counted on binary compatibility when it wasn't documented.

(PS: if you want to punish these people, it is much more productive and direct to spectacularly break your ABI so that people can't help but notice. People are kind of slow to notice subtle problems and they may not even realize what's going on for some time.)

by cks at March 09, 2010 05:32 AM

Racker Hacker

Rackspace Cloud Tech Podcast Episode 2

I participated in a podcast for the Rackspace Cloud with Robert Collazo last week. We covered some important topics including network security and convenient deployment tools.

©2010 Racker Hacker. All Rights Reserved.

.

by Major Hayden at March 09, 2010 01:51 AM

March 08, 2010

:wq

Natural Language Processing in Clojure with clojure-opennlp

NOTE: I am not a linguist, please feel free to correct me in the comments if I use the wrong term!

From Wikipedia:

Natural Language processing (NLP) is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages. Natural language generation systems convert information from computer databases into readable human language. Natural language understanding systems convert samples of human language into more formal representations such as parse trees or first-order logic structures that are easier for computer programs to manipulate. Many problems within NLP apply to both generation and understanding; for example, a computer must be able to model morphology (the structure of words) in order to understand an English sentence, and a model of morphology is also needed for producing a grammatically correct English sentence.

Clojure-opennlp is a library to interface with the OpenNLP (Open Natural Language Processing) library of functions, which provide linguistic tools to perform on various blocks of text. Once a linguistic interpretation of text is possible, a lot of really interesting applications present themselves. Let’s jump right in!

Basic Example usage (from a REPL)

(use 'clojure.contrib.pprint) ; just for this example
(use 'opennlp.nlp) ; make sure opennlp.jar is in your classpath

You will need to make the processing functions using the model files. These assume you’re running from the root project directory of the git repository (where some models are included). You can also download the model files from the opennlp project at http://opennlp.sourceforge.net/models/

user=> (def get-sentences (make-sentence-detector "models/EnglishSD.bin.gz"))
user=> (def tokenize (make-tokenizer "models/EnglishTok.bin.gz"))
user=> (def pos-tag (make-pos-tagger "models/tag.bin.gz"))

For name-finders in particular, it’s possible to have multiple model files:

user=> (def name-find (make-name-finder "models/namefind/person.bin.gz" "models/namefind/organization.bin.gz"))

The (make-<whateverizer> "modelfile.bin.gz") functions return functions that perform the linguistic offering. I decided to have them return functions so multiple methods doing the same sort of action could be created with different model files (perhaps different language models and such) without having the pass the model file every time you wanted to process some text.

After creating the utility methods, we can use the functions to perform operations on text. For instance, since we defined the sentence-detector as ‘get-sentences’, we can us that method to split text by sentences:

user=> (pprint (get-sentences "First sentence. Second sentence? Here is another one. And so on and so forth - you get the idea..."))
["First sentence. ", "Second sentence? ", "Here is another one. ",
"And so on and so forth - you get the idea..."]
nil

Or split a sentence into tokens using the tokenize function:

user=> (pprint (tokenize "Mr. Smith gave a car to his son on Friday"))
["Mr.", "Smith", "gave", "a", "car", "to", "his", "son", "on",
"Friday"]
nil

Once we have a sequence of tokens, we can do what’s called POS Tagging. POS Tagging takes a list of words from only one sentence and applies an algorithms (using the morphology model) to determine what kind of tag to apply to each word:

user=> (pprint (pos-tag (tokenize "Mr. Smith gave a car to his son on Friday.")))
(["Mr." "NNP"]
["Smith" "NNP"]
["gave" "VBD"]
["a" "DT"]
["car" "NN"]
["to" "TO"]
["his" "PRP$"]
["son" "NN"]
["on" "IN"]
["Friday." "NNP"])
nil

You can check out a list of all the tags if you want to know what they stand for.

The clojure-opennlp library also features a name finder, however it is extremely rudimentary at this point and won’t detect all names:

user=> (name-find (tokenize "My name is Lee, not John."))
("Lee" "John")

Filters

In the library, I also provide some simple filters that can be used to pare down a list of pos-tagged tokens using regular expressions. There are some preset filters available, as well as a macro for generating your own filters:

(use 'opennlp.tools.filters)

user=> (pprint (nouns (pos-tag (tokenize "Mr. Smith gave a car to his son on Friday."))))
(["Mr." "NNP"]
["Smith" "NNP"]
["car" "NN"]
["son" "NN"]
["Friday" "NNP"])
nil
user=> (pprint (verbs (pos-tag (tokenize "Mr. Smith gave a car to his son on Friday."))))
(["gave" "VBD"])
nil

Creating your own filter:

user=> (pos-filter determiners #"^DT")
#'user/determiners
user=> (doc determiners)
-------------------------
user/determiners
([elements__52__auto__])
Given a list of pos-tagged elements, return only the determiners in a list.
nil
user=> (pprint (determiners (pos-tag (tokenize "Mr. Smith gave a car to his son on Friday."))))
(["a" "DT"])
nil

Check out the filters.clj file for a full list of out-of-the-box filters.

That’s about all there is in the library at the moment, so I hope that made sense. Unfortunately clojars.org does not provide a nice way to public documentation for a library, so the documentation in this post and on the github page will have to do for now.

This library is available on clojars for inclusion in leiningen projects, or on github if you’re interested in the source. This is a fairly new project, and not all OpenNLP features are exposed at the moment so feedback is definitely encouraged. In the next post I’ll explain an in-depth example of how these functions can be used to enhance a searching engine. EDIT: It’s up! Check out “Context searching using clojure-opennlp.”

UPDATE: Hiredman has let me know that the jar on clojars is missing the 3 dependencies used for the library. I’m busy working on writing pom.xml’s for the jars so I can upload them to clojars as dependencies. In the meantime, make sure you have the 3 jars in the lib directory (of the github project) in your classpath. Feel free to report any other issues on the github tracker or in the comments.

UPDATE 2: I fixed the project.clj file and pushed new versions of opennlp.jar and the dependency jars. A regular ‘lein deps’ should work now.

by Lee at March 08, 2010 08:51 PM

Everything Sysadmin

LOPSA Conference schedule published!

If you were waiting to register until the complete schedule was revealed, get that credit card out!

LOPSA PICC last night published the final slate of papers and speakers (if you didn't get your accept/sorry email, please let us know). http://picconf.org now contains the complete schedule.

You can attend for as little as $249, or $99 for students. The training program is extra.

If you aren't sure how to ask your boss for permission, we have some advice.

Tom

by Tom Limoncelli at March 08, 2010 06:53 PM

TechRepublic IT Security

The unwelcome return of image spam

As soon as spam fighters gain ground, spammers switch tactics, sometimes going back to what worked before. Hence the return of image spam.

——————————————————————————————————————————–

Image spam has been around for many years, having spurts of popularity when spam filters get good at detecting normal types of spam. Each resurgence has seen image spam increase in sophistication. To try and understand image spam, I started talking to the people at Red Condor, a well-known spam filtering service.

Brien Voorhees, one of the founders of Red Condor was kind enough to answer my numerous questions. Let’s see what he has to say about image spam:

TechRepublic: I keep hearing about image spam and how spammers are using it to get past filters. What is image spam?

Voorhees: It’s a spam email where the spammer’s message or pitch is represented in an attached/embedded image instead of text. Often, the email will also have unrelated text in the body of the message to throw off filters, but the actual pitch will be in the image.

TechRepublic: Why is image spam so difficult to detect?

Voorhees: The purpose of spam is to get the user to take some kind of action, whether it’s clicking on a link to buy a product, calling a phone number, or replying to an email address. The spammer can randomize the content of their messages and where they come from, but it’s difficult to randomize the actual call to action.

Any kind of consistency in a spam campaign can be used by a filter to identify, target, and block the campaign. When the “call to action” is displayed visually, the computer can’t recognize it without computationally expensive Optical Character Recognition (OCR) processing. The images are almost always randomized to some degree to prevent OCR and also make each image unique.

Since an email containing a spam image looks almost identical to one with a picture of your grandkids, it’s extremely difficult to block them without also causing a lot of collateral damage (false-positives).

Techrepublic: What processes does Red Condor have in place to filter image spam?

Voorhees: Red Condor employs several different technologies to effectively block image spam without also blocking good images:

  • Image fingerprinting: We have fast, efficient “fuzzy” matching algorithms that can target specific areas of an image.
  • Our system also looks at the reputation of the IP address delivering the message and can be stricter if the message has image spam characteristics.
  • Continuous feedback loops: Including humans in the review process.

In the near future, Red Condor will be introducing a new layer to its image spam defenses. While more details will come out soon, I can say the new layer will successfully identify image-spam campaigns based on a unique combination of structural elements present in both the image and the message.

TechRepublic: I understand that one of the new techniques is to just use an image, no text in the subject line or anywhere in the body. What is the purpose of that?

Voorhees: I haven’t noticed that as a very common technique. Most of the campaigns I see do have some amount of (unrelated) text. Regardless of the message body, a typical image spam ends up looking very similar to an email sent by a human. The spammers are good at making the message look like it was sent from a real person using Outlook, etc., and the messages are delivered from various infected machines (as part of a botnet) instead of the spammer’s own computers.

TechRepublic: Image spam was prevalent several years ago. Then it tapered off; was that because it required the victim to manually enter link information?

Voorhees: Yes, the “click on the link” requirement is a huge drawback to the image spam technique and makes it not desirable for most spammers. The really big image campaigns years ago were primarily the stock pump-and-dump campaigns.

It was a good technique for them because the “call to action” wasn’t for the user to click on a specific link; rather instead to go to their broker (offline or online) and buy the stock. The major stock spammers eventually got sued or arrested and the image spam levels dropped to a fraction of what they were.

TechRepublic: Why do you think image spam is making such a strong comeback?

Voorhees: While image spam is making a bit of a comeback it’s nowhere near the level it was several years ago. As for why, I think it’s partly desperation. Filtering technology has gotten pretty good in recent years and the spammers are constantly looking for any chink in the armor.

Image spam is still one of the most difficult types to accurately block. Due to image spam’s inherent disadvantages though, I think it will continue to be an annoyance, but not the majority of spam.

TechRepublic: Do you have any more thoughts about image spam?

Voorhees: Since the early days of image spam, it’s been interesting to watch it change, usually in response to filter adaption-basically evolution in action:

  • In the beginning, all of the images would be identical. Then they started to add some simple randomization to the image “header” and/or palette to defeat basic fingerprinting (MD5 hash, etc).
  • Next, they started to scale the image to varying sizes.
  • Then they added a small amount of obfuscation noise, varying background colors, even tilting the image slightly.
  • For a while, spammers were also using animated GIFs or slicing the image up into several smaller sections.
  • They also tried delivering the images by embedding them inside PDFs.

Over time, the randomization has become more and more extreme to the point where it is difficult to discern the content (similar to some CAPTCHAs). Recent image spam campaigns employ multiple techniques together with color changes, scaling, noise, and waving. That makes it basically impossible to use OCR to decode the text.

Below is an example of what Mr. Voorhees is referring to:

TechRepublic: Changing the subject, I am curious as to how Red Condor came into being?

Voorhees: Myself and two other engineers were wrapping up some contract work and had been keeping an eye out for the right opportunity to create our own product and company. Spam was starting to become a real annoyance at the time, making it difficult to find real messages lost among the junk.

We checked out the available filtering options and didn’t find anything satisfactory. They forced the user to make a choice between letting too much spam through or blocking too many good emails. It has always surprised me how accepting many filtering companies (and some users) are of false-positives.

Personally, I consider all of my email to be critical and one lost message is one too many. As a group we recognized the opportunity to create a new spam filter that would meet our own standards.

Final thoughts

I like the explanation of how spam requires a “call to action.” It defines what is needed to make spam work. Image spam, by its nature doesn’t provide an easy way to accomplish that. Yet it’s hard to detect image spam. Since the use of image spam is increasing, spammers must feel getting the spam in front of us is more important.

I would like to extend my thanks to Tim McAllister of Red Condor for pointing me in the right direction, Kevin Wilson of KevinWilsonpr.com for making it all happen, and finally Brien Voorhees for his insight into the world of spam.




by Michael Kassner at March 08, 2010 04:58 PM

Steve Kemp's Blog

You Greeks take pride in your logic. I suggest you employ it.

Tomorrow, all being well, I'll receive a new computer.

I've always run Debian unstable upon my desktop in the past, partly because I wanted to have "new stuff" and partly because I needed a Debian unstable system for building Debian packages with.

However I'm strongly tempted to just install Lenny. I use that upon my work desktop and it does me just fine for surfing, building tools, and similar.

I can use pbuilder, sbuildd, or similar to build packages for upload to Debian, and if I want to experiment with new-hotness I can use a KVM guest or two.

Providing the hardware works with Lenny (and I have no reason to believe it won't) then there's no obvious downside I can think of.

The only potential complication will be restoring my backups, it is possible that my firefox databases, and similar things, might not work on older version. Still we shall see.

I plan to install software RAID, and run the system on LVM because quite frankly it rocks. Unless my current system fails in the next 24 hours I can use that to do the installation (My current desktop acts as a TFTP/DHCP/NFS server so I can use it to PXE-boot).

Anyway now I need to go eat food, tidy my desk, and decide what to call the machine .. At the moment the choice is between "march.my.flat" and birthday.my.flat, as my 34th birthday is on March 10th.

ObFilm: 300

March 08, 2010 03:11 PM

cmdln.org

XenServer License Check – Nagios NRPE Plugin

If you hadn’t already guessed I am a big fan of the Xen hypervisor. Lately I have been using the Citrix XenServer release because it makes it quite palatable for my co-workers. One annoyance that I do have about XenServer is the requirement that you license it (with a free license) every year. If you [...]

by Nick Anderson at March 08, 2010 02:51 PM

TechRepublic Network Administrator

Trading bandwidth for infrastructure when moving to the cloud?

Considering moving to the cloud? Have you worked out the cost models? That’s the question that needs hammered out first. IT guru Rick Vanover breaks down the trade-off between infrastructure and bandwidth.

—————————————————————————————

Simply speaking, if you are going to move to the cloud with anything there will likely be some sort of bandwidth change required. I believe that one of the easiest entry points for a cloud solution is a storage offering that functions as a data protection mechanism. The challenge is, is the bandwidth available to transport significant amounts of data to the cloud?

Products are now available that have data protection to the cloud. One example is Commvault’s Simpana cloud-enabled integration. This allows administrators the ability to transfer data protection tiers to a choice of cloud providers. Figure A shows this new feature with the current product:
Figure A

Figure A

Click image to enlarge

Don’t get me wrong, I like the concept. The issue is how much can we push through the wire to the cloud storage provider? Sure, the data is deduplicated and encrypted, but what kind of connectivity do we really need to transfer the data outbound to the cloud?

One of my colleagues in the blogosphere, Greg Ferro of Ethereal Mind even confirmed one of my suspicions in a recent Twitter post. Greg came to the conclusion that a cloud solution ended up costing twice what it would have compared to running in-house. While every cloud solution is different, it gets me thinking – what about the bandwidth? Providing 50 or 100 Megabits for outbound Internet connectivity has been more than enough for most workplaces up to this point. But, the moment a storage transfer is rolled into the mix – things change substantially.

Further, what about the initial load? Does it make more sense to only start with a select footprint of new systems? Depending on many factors, the initial transfer of a data protection mechanism could be in the Terabytes.

With all of these concerns, there are opportunities. It would be nice to retire all tape devices in the enterprise. This, of course, is depending on your interpretation of data at rest off-site. Does cloud storage meet this criterion? If your tape drive is in a warehouse, you can’t go physically to the location and inside the vault to retrieve the device. How does a cloud storage provider differ from that? In previous posts on cloud computing, the TechRepublic community has not been shy to point out the loss of control with cloud solutions.

Does data protection appeal to you as a cloud solution? If so, how would you use it? Please share your comments below.



by Rick Vanover at March 08, 2010 02:38 PM

A Year in the Life of a BSD Guru

Quick Poll: What would you like to see from the book?

I have permission to include a few pages from the Definitive Guide to PC-BSD in the next issue of BSD Mag and could use some help determining what to submit.

March 08, 2010 02:13 PM

TaoSecurity

Traffic Talk 10 Posted

I just noticed that my tenth edition of Traffic Talk, titled Pcapr.net -- where Web 2.0 meets network packet analysis, has been posted. From the article:

Solution provider takeaway: Pcapr.net is a free packet collaboration site hosted by Mu Dynamics. Solution providers can participate in the community to exchange, analyze and gather traces for testing products or processes for their customers, including network packet analysis.

Not many networking solution providers are happy with the apparently limited number of network traces available for testing their products or processes. Hardly a day goes by on a network-focused mailing list without a participant asking, "Where can I download network traffic to test X?" Fortunately for anyone who wants to take network traffic exchange to a new level, Mu Dynamics has answered the call. Its Pcapr.net site is the self-proclaimed "Web 2.0 for packets." In this edition of Traffic Talk, we'll take a tour of Pcapr.net to see what features it offers networking solution providers, including network packet analysis.

by Richard Bejtlich (noreply@blogger.com) at March 08, 2010 02:09 PM

iDogg

I finally broke down and got a smart phone

My wife and I picked up a couple of Droids yesterday.  Her phone was broken and mine was up for the new in two.  Since they are currently 2 for 1, we took the jump.

This is my first real smart phone.  I like it so far.  There are a ton of options.  I can see why this platform as presented(the normal android interface), wouldn’t be ideal for every day users, and really pointed towards people who aren’t afraid to mess around with their phone.

I like the Gmail integration.  The touch screen works better than I thought it would and I have no problems with the on screen keyboard.  If I’m doing a bit more typing, I’ll slide the physical keyboard out.  Wireless on a phone is great as I’m normally around a wireless AP when at work or at home.  The 3G will work fine in between.

There are a couple of things that bug me a bit.  Novell is still working on their sync tool for modern phone to replace the GroupWise Mobile server.  Outside of the SMTP and IMAP combo, there’s no good way to get your GroupWise mail on the phone.  The default GW8 webaccess interface doesn’t work with the default browser or the dolphin browser.  You have to switch it to basic mode to open mail.  Not that aesthetics are everything, but the basic mode in webaccess is just that, BASIC.  For now, I’ll just be forwarding my Zenoss and other SMTP alerts to my gmail account.

I downloaded a task killer tool which I find myself using often.  I’m not sure how much I really need to be using it, but to keep the battery from getting sucked down, I’m using it.  That’s a bit annoying and takes me back to the old Mac operating system where you had to manage your RAM manually for each application.  It’s not the end of the world, but I guess it’s the price you pay for multitasking.

by Ian at March 08, 2010 01:02 PM

Standalone Sysadmin

Back to our normally scheduled blog posts

Or as (ir)regular as they normally are. I really hope that you enjoyed the flashback week, and got something useful from it. I’m going to try to do it again next year on the first full week of March.

Now it’s just back to the daily grind for me. I’ve been rehashing some Nagios configuration and I’ve unearthed an ancient relic! How fun! Configuration archaeology is a hobby of mine, and to find a gem that hasn’t (as far as I can tell) been mentioned on the official site since 2002? That’s GREAT! I’ve still got to go through the source code to make sure that it doesn’t do anything interesting, but it’s out of my config now.

As it turns out, my recent attention to Nagios is multifaceted. I’m cleaning up the config and tightening up the alert rules, but also, I’m going to be giving a 45 minute talk at the Professional IT Community Conference in May. If you’re in the northeast US, you should definitely make it! And you should hurry and register while the early bird special is going!


by Matt Simmons at March 08, 2010 12:17 PM

A Year in the Life of a BSD Guru

BSD for Linux Users Audio now Available

The audio for my SCALE 2010 talk on BSD for Linux Users is now available in mp3 format. The accompanying slides are in PDF format.

March 08, 2010 09:13 AM

CiscoZine

March 2010: three new Cisco vulnerabilities

On March 3 2010, the The Cisco Product Security Incident Response Team (PSIRT) has published three important vulnerability advisories:

  • Cisco Digital Media Player Remote Display Unauthorized Content Injection Vulnerability
  • Cisco Digital Media Manager Vulerabilities
  • Cisco Unified Communications Manager Denial of Service Vulnerabilities

Cisco Digital Media Player Remote Display Unauthorized Content Injection Vulnerability
A vulnerability exists in the Cisco Digital Media Player that could allow an unauthenticated attacker to inject video or data content into a remote display.

Vulnerable Products
Cisco Digital Media Player versions earlier than 5.2 are affected by this vulnerability.

Details
Cisco Digital Media Players are IP-based endpoints that can play high-definition live and on-demand video, motion graphics, web pages, and dynamic content on digital displays. The Cisco Digital Media Player contains a vulnerability that could allow an unauthenticated attacker to inject video or data content into a remote display.

Impact
Successful exploitation of the vulnerability could allow an unauthenticated attacker to inject video or data content into a remote display.

Link: http://www.cisco.com/…/security_advisory09186a0080b1b925.shtml

 

Multiple Vulnerabilities in Cisco Digital Media Manager
Multiple vulnerabilities exist in the Cisco Digital Media Manager (DMM). This security advisory outlines details of the following vulnerabilities:

  • Default credentials
  • Privilege escalation vulnerability
  • Information leakage vulnerability

These vulnerabilities are independent of each other.

Vulnerable Products
The following products are affected by vulnerabilities that are described in this advisory:

  • Cisco Unified Communications Manager 4.x
  • Cisco Unified Communications Manager 5.x
  • Cisco Unified Communications Manager 6.x
  • Cisco Unified Communications Manager 7.x

Details
Cisco Unified Communications Manager is the call processing component of the Cisco IP Telephony solution that extends enterprise telephony features and functions to packet telephony network devices, such as IP phones, media processing devices, VoIP gateways, and multimedia applications.

Impact
Successful exploitation of the vulnerabilities that are described in this advisory could result in the interruption of voice services. An affected Cisco Unified Communications Manager services may require a manual restart to restore voice services.

Link: http://www.cisco.com/…/security_advisory09186a0080b1b923.shtml

 

Cisco Unified Communications Manager Denial of Service Vulnerabilities
Cisco Unified Communications Manager (formerly Cisco CallManager) contains multiple denial of service (DoS) vulnerabilities that if exploited could cause an interruption of voice services. The Session Initiation Protocol (SIP), Skinny Client Control Protocol (SCCP) and Computer Telephony Integration (CTI) Manager services are affected by these vulnerabilities.

Vulnerable Products
The following products are affected by vulnerabilities that are described in this advisory:

* Cisco Unified Communications Manager 4.x
* Cisco Unified Communications Manager 5.x
* Cisco Unified Communications Manager 6.x
* Cisco Unified Communications Manager 7.x

Details
Cisco Unified Communications Manager is the call processing component of the Cisco IP Telephony solution that extends enterprise telephony features and functions to packet telephony network devices, such as IP phones, media processing devices, VoIP gateways, and multimedia applications.

Impact
Successful exploitation of the vulnerabilities that are described in this advisory could result in the interruption of voice services. An affected Cisco Unified Communications Manager services may require a manual restart to restore voice services.

Link: http://www.cisco.com/…/security_advisory09186a0080b1b924.shtml


© Fabio Semperboni for CiscoZine, 2010. | Permalink | No comment
Post tags: , ,

by Fabio Semperboni at March 08, 2010 08:45 AM

Anton Chuvakin - Security Warrior

Simple Log Review Checklist Released!

Today, many people are looking for very simple solutions to big and complex problems – and the area of logging and log management is no exception. Following that theme, we have created a "Critical Log Review Checklist for Security Incidents" which is released to the world today.

In addition to HTML, PDF or DOC versions are available as well (alternative hosting location is here). Feel free to modify the checklist for your own purposes or for internal distribution in your organization - but please keep the attribution to the authors.

The log cheat sheet presents a checklist for reviewing critical system, network and security logs when responding to a security incident. It can also be used for routine periodic log review. It was authored by Dr. Anton Chuvakin and Lenny Zeltser (BTW, Lenny has other useful security cheat sheets on malware analysis, security architecture, DDoS, etc  here)

Here is the embedded version from DocStoc:


Critical Log Review Checklist for Security Incidents -

Enjoy!


by Dr Anton Chuvakin (noreply@blogger.com) at March 08, 2010 07:34 AM

Chris Siebenmann

Exceptions versus error return values

Exceptions versus error return values

Python has two ways of signalling that a function has failed; you can raise an exception or return a special error value of some sort. I use both techniques in different circumstances; since I've recently been writing some Python code, I've been thinking about exactly what those circumstances are, as far as I can tell.

(Self-analysis is tricky given that I don't particularly think through the choice when I'm making it; I handle errors however seems right for the function I'm writing at the time.)

Generally, I tend to use error return values if I expect failure to be routine, especially if there is a natural return value that is easy for callers to use. For example, getting a list of IPv4 and IPv6 addresses for a host; it's routine to look up nonexistent names (or at least names with no IP addresses), and returning an empty list is an easy return value for callers to use (since in many cases they will just iterate through the list of IPs anyways).

I use exceptions if I expect failure to be rare, especially if there is nothing that the direct caller of a function is going to do to handle the problem. If the only thing that I'll do on failure is abort the program with a pretty error message, there's no need to complicate all of the code between the program's main routine and the failing function with code to check for and immediately return the error. (The obvious exception is if there is cleanup work to be done on the way out, but I've come up with ways to handle that, similar to phase tracking.)

I'm pretty sure that I'd use exceptions even for common failures if they had to be handled by someone other than the function's direct caller; I don't like cluttering functions up with a bunch of 'if error: return error' code.

This view is not the common Python one. As we can see from the standard library, the Pythonic way uses exceptions a lot more often than I do.

(I'd argue that this is a sensible tradeoff for a library, too. The advantage of exceptions is that they are unambiguous signals of failures that you can't possibly confuse with valid return values, and they force people using your library to explicitly deal with errors.)

by cks at March 08, 2010 06:41 AM

March 07, 2010

Everything Sysadmin

Tom @ Usenix LISA 2010, San Jose, CA, Nov 7-12, 2010

Tom's presentation is TBD. (Including this with the "appearances" tag so it shows up on the navigation)

by Tom Limoncelli at March 07, 2010 09:44 PM

Tom @ LOPSA PICC in NJ, May 7-8, 2010

Tom will be the Saturday opening keynote, plus he will be teaching his two most popular half-day classes: Time Management for System Administrators, and "Help! Everyone hates our IT department!". LOPSA NJ PICC is in New Brunswick, NJ, May 7-8, 2010. It is a regional conference, everyone is invited. For more information: http://picconf.org

Reblog this post [with Zemanta]

by Tom Limoncelli at March 07, 2010 09:41 PM

Chris Siebenmann

Why I don't expect third-party support for OpenSolaris

Why I don't expect third-party support for OpenSolaris

One of the common reactions to Oracle's potentially ambivalent attitude towards providing OpenSolaris support is that since OpenSolaris is open source, third parties can spring up to provide support for it even if Oracle doesn't. However, I'm fairly pessimistic about the chances of this; even if OpenSolaris itself becomes reasonably popular, I don't think that we'll ever see an OpenSolaris equivalent of Red Hat or Canonical.

There's two reasons for this. One of them is the difference between forking code and merely supporting it, which comes down to your ability to get your bugfixes accepted upstream. My impression to date is that in practice there are relatively few outside contributors to OpenSolaris and that it is hard to get changes accepted upstream. This pushes anyone attempting to do OpenSolaris support towards de facto forking OpenSolaris, which is expensive and thus makes you unprofitable.

(Some casual searching didn't turn up any information about the rate of outside contributions to OpenSolaris that's more recent than 2008, when the news wasn't good. Certainly the OpenSolaris repository shows very little signs of contributions from outside developers, and there is no sign that the practices described in 2008 have changed much. Note that pushing changes upstream is hard at the best of time; you can imagine how much worse this gets if the upstream is not really interested in the whole business of outside contributions, especially if something is going to require significant amounts of effort and time from upstream developers.)

The other reason is more subtle. In order to really support code, you must have good programmers who understand it. With Sun not really being very enthusiastic about outside contributions, there are not many people like that outside of Sun (or, well, outside of Sun before Oracle took over and people started leaving). In addition, your good OpenSolaris programmers are probably going to face the constant temptation of taking a job with Oracle where they can actually work directly on OpenSolaris; the better they are and the more passionate about OpenSolaris they are, the higher the temptation. The less expert your programmers are the less attractive your support is, since you can't diagnose and fix people's problems as fast or as well.

(And if you can find good expert OpenSolaris programmers right now it's pretty likely that they're quite passionate, given the obstacles to acquiring that expertise.)

by cks at March 07, 2010 05:27 AM

March 06, 2010

TaoSecurity

Einstein 3 Coming to a Private Network Near You?

In my Predictions for 2008 I wrote:

Expect greater military involvement in defending private sector networks... The plan calls for the NSA to work with the Department of Homeland Security (DHS) and other federal agencies to monitor such networks to prevent unauthorized intrusion, according to those with knowledge of what is known internally as the "Cyber Initiative."

Now in Feds weigh expansion of Internet monitoring we read:

Homeland Security and the National Security Agency may be taking a closer look at Internet communications in the future.

The Department of Homeland Security's top cybersecurity official told CNET on Wednesday that the department may eventually extend its Einstein technology, which is designed to detect and prevent electronic attacks, to networks operated by the private sector. The technology was created for federal networks.

Greg Schaffer, assistant secretary for cybersecurity and communications, said in an interview that the department is evaluating whether Einstein "makes sense for expansion to critical infrastructure spaces" over time.

Not much is known about how Einstein works, and the House Intelligence Committee once charged that descriptions were overly "vague" because of "excessive classification." The White House did confirm this week that the latest version, called Einstein 3, involves attempting to thwart in-progress cyberattacks by sharing information with the National Security Agency.


The first step towards creating Cyber NORAD is instrumentation. Stay tuned.

by Richard Bejtlich (noreply@blogger.com) at March 06, 2010 10:28 PM

The Blog of Ben Rockwood

FAST 2010 Proceedings Available

I've missed FAST 2010 yet again.... but, good news! The complete FAST 2010 Proceedings (PDF) are available for free. USENIX members can also view the presentation videos online.

by benr at March 06, 2010 10:28 PM

TaoSecurity

Making a Point with Pressure Points

Imagine you're a martial arts student. One day you have a guest instructor, accompanied by some of his black belts. They're experts in so-called "pressure point fighting." You've heard a little of this system, whereby practitioners can knock out adversaries with a series of precise strikes that lack the power of a brute-force approach. Until today you've had no direct experience. You may be skeptical, or maybe you believe such techniques are possible.

The seminar starts. You watch the guest instructor explain his techniques. He starts knocking out his black belts. Maybe you believe what you see, or maybe you don't. Then the instructor asks for volunteers, and several of your fellow students agree. The instructor knocks them all out, including a student you really trust to not "take a fall" to make the guest "look good." You ask the student "what happened?" and he replies "that dude knocked me out!"

Next the black belts fan out through the class to help teach pressure point techniques. They ask you if you want to get knocked out with a three-strike technique, or if you just want to feel disoriented with a two-strike technique. You decide you're a believer at this point, but you want to see what it feels like to receive a two-strike technique. Sure enough, two rapid strikes later, you're wondering what happened but are still conscious. That's all you need to believe; you're glad you're not lying on the floor, out cold!

The class ends. Several bystanders were watching through the studio's windows. Some of them are laughing. They think the whole class was fake, a joke, or stupid. Some witnesses are curious. They believe what they saw and want to know more. A few ask questions. Others mumble to themselves incoherently, probably intoxicated or mentally ill.

One of the students decides to talk to a famous yet local news reporter about his experience. This widely-read newspaper reports the story the next day, attracting a lot of attention.

With a wider audience, an extended discussion takes place about this pressure-point fighting activity.

One company conducts a Webcast and a spokesperson says "my mom used to knock me out with a frying pan when I was a kid!" He also says there's no difference between pressure-point fighting and getting punched in the face.

Another company decides to register a domain name called "pressurepointfighting.biz" and starts talking about how it works, applying what they know from Western boxing. This misses the mark but uninformed observers can't really tell the difference.

A third company jumps on the pressure point fighting bandwagon, issuing supposedly original research, inventing its own analysis, and integrating the technique into its marketing material. It turns out someone at the company had a confidential agreement with the original pressure point fighting instructor, but unilaterally decided to take a few pages out of his notebook and run to the market to make a fast buck.

A fourth company knows a lot about pressure point fighting. It writes original reporting based on its experience. Critics claim this company is just offering marketing based on the new craze.

Reaction to the news among those without direct experience is mixed, as might be expected.

Some readers are martial artists themselves. They fear being irrelevant. They are afraid their skills are not sufficient. They decide to ridicule anyone who participated in the seminar, or who has knowledge.

Some readers distrust authority. They think these techniques are just a government conspiracy to justify additional police powers. The only reason anyone is talking about such affairs is their need to get greater budgets for their oppressive police powers, man!

Some readers think the whole affair is "fear, uncertainty, and doubt" (FUD). Who could knock out a person by hitting a few pressure points? It's all a lie, or just the latest craze. It must be fake.

Some readers have been learning and practicing pressure point fighting for the last several years. They know it isn't a joke, and it is real. Also, some readers without experience realize they should learn more about pressure point fighting. That knowledge could save their lives, or the lives of those close to them. These like-minded people communicate privately, since the public arenas are now clogged with too many false discussions.

Aside from the fact that advanced persistent threat is an adversary, and not a fighting technique, this story explains the last 6 weeks of APT activity in the security industry. Not all factors are included, but enough to make my point.

Incidentally, the pressure point class is true, at least as far as the class content is described.

by Richard Bejtlich (noreply@blogger.com) at March 06, 2010 06:59 PM

Keeping FreeBSD Applications Up-to-Date in BSD Magazine

The March 2010 BSD Magazine includes an article I wrote titled Keeping FreeBSD Applications Up-to-Date.

It's a sequel to my article in the January 2010 BSD Magazine titled Keeping FreeBSD Up-to-Date: OS Essentials.

With these two articles published, they replace the versions I wrote in 2005.

I wrote these articles to demonstrate the variety of ways a system administrator can keep the FreeBSD operating system and applications up-to-date, with examples showing commands and effects.

by Richard Bejtlich (noreply@blogger.com) at March 06, 2010 09:22 AM

Chris Siebenmann

Pushing code changes upstream is hard work

Pushing code changes upstream is hard work

This is a followup to SupportingVsForking, where I talked about the difference between supporting some open source code and forking it being whether you could get your changes accepted upstream. One thing that is not widely understood is that getting bugfix changes accepted upstream is hard work at the best of times, with a cooperative upstream.

(We can see this from how many private changes to the Linux kernel each Linux distribution maintains.)

The problems are many. Often there is a conflict between the expedient way for you to fix a problem now and the 'right' way to fix the problem, which the upstream is going to argue for and which may require quite a lot of development (and arguing with developers); sometimes the upstream won't even know what the right way is, but they'll know that your way is the wrong way. In some cases, you and the upstream may disagree about whether there is a bug and (if there is a bug) where it exists and what exactly it is. Some times the upstream may accept that there is a bug but feel that fixing it is too disruptive at the current time.

(And all of this assumes that your proposed change is good code. Sometimes it isn't; the most common case in Linux is new hardware drivers, which often contain code that varies from the merely bad to the outright wretched. A distribution often needs to support the new stuff soon and didn't write the drivers; the upstream needs to have code that can be maintained over the long term by people other than its original authors.)

All of these translate to 'thanks but no thanks' for your bug fix or change, which means that you get to maintain more code for a while.

It's worth noting that accepting downstream changes is work for the upstream too. Many of these problems require an investment of time from upstream developers to read code, debate approaches, investigate problems, and so on, and the time of upstream developers is in limited supply. Plus, things like code reviews and arguing with people about whether something is the right approach or is actually a bug are not very rewarding or fun activities, which makes it harder to persuade developers to do them.

(All of this goes even more so if you are adding features or removing limitations instead of fixing bugs, because those raise much larger questions of whether they should be done at all and if your approach is the right approach.)

by cks at March 06, 2010 05:50 AM

March 05, 2010

Slaptijack

The Tech Teapot

Planet Network Management Highlights 2010 Week 9

Highlights from Planet Network Management for Week 9.


by Jack Hughes at March 05, 2010 03:18 PM

cmdln.org

Flashback: Remote kernel logging with netconsole for fun and profit

Have you ever had a machine that was a bit flaky? You know those ones that occasionally crash and don’t write anything useful into the log file. Sometimes you can capture those messages with netconsole. Just revisiting a small walk-through I wrote a while back. Remote kernel logging with netconsole for fun and profit ©2010 cmdln.org [...]

by Nick Anderson at March 05, 2010 03:17 PM

High Scalability

Strategy: Planning for a Power Outage Google Style

We can all learn from problems. The Google App Engine team has created a teachable moment through a remarkably honest and forthcoming post-mortem for February 24th, 2010 outage post, chronicling in elaborate detail a power outage that took down Google App Engine for a few hours.

The world is ending! The cloud is unreliable! Jump ship! Not. This is not evidence that the cloud is a beautiful, powerful and unsinkable ship that goes down on its maiden voyage. Stuff happens, no matter how well you prepare. If you think private datacenters don't go down, well, then I have some rearangeable deck chairs to sell you. The goal is to keep improving and minimizing those failure windows. From that perspective there is a lot to learn from the problems the Google App Engine team encountered and how they plan to fix them.

Please read the article for all the juicy details, but here's what struck me as key:

by Todd Hoff at March 05, 2010 03:11 PM

TechRepublic Network Administrator

Secret agents: Make SNMP work for you

Blogger Mark Underwood lays out the ways you can use SNMP agents to monitor network devices, and even set it up to send software alerts as well.

—————————————————————————————

Out there, working for you, are agents. Feed them a little port UDP/161,162 and they’ll deliver a dossier on many network devices, in the form of a Management Information Base (MIB).

Just got hired after the last network administrator got promoted to CIO? Grab a free network management tool that has an SNMP (Simple Network Management Protocol) agent listener (SpiceWorks, Net-SNMP, NetXMS, Nagios, Zenoss and many more), then head over to the local Wi-Fi-enabled coffee establishment. Chances are good you’ll have charts and diagrams to visualize what you’ve gotten yourself into.

SNMP considerations

Here’s a list of pros and cons for using SNMP agents, which I’ll discuss in more detail below.

Pros

  • Intrusion tripwires
  • Quick network overview
  • Indispensible for switches representing  single point of failure
  • Proactive warnings for failing hardware
  • Enable performance monitoring
  • Detect software failures and anomalies
  • Best practice for industry standard, interoperable device descriptions with ontologies

Cons

  • False Positives
  • Log management
  • Too much information; can’t see the forest
  • Complex monitoring environment
  • Configuration Management
  • Agent authentication and default public settings
  • Multiple agent message formats

SNMP basics

The basic notion of SNMP is that of an agent-based notification system. Each device, even many low level switches and printers, is equipped with an agent ready to do your bidding. The notification, or “trap,” can be generated by an agent developed by the device manufacturer, or listener software can monitor systems for specific events, such as particular items of interest in an event log, and send traps to an SNMP trap handler or other network management tool.

SNMP can be thought of as one framework within a number of overlapping frameworks that include Microsoft Windows Management Instrumentation (WMI), the Web Based Enterprise Management (WEBM) and the Common Information Model (CIM). CIM has evolved into an entire object model that DMTF describes using graphical language taken from the Unified Modeling Language (UML).

SNMP does Windows or Linux

Microsoft has fully embraced the CIM model in WMI. For example, open a command window on many Vista, Windows 7, or Server 2008 machines and type:

winrm enumerate wmicimv2/Win32_ComputerSystem

The tool will list a machine’s basic hardware information such as the motherboard manufacturer, but also Domain membership, status of the administrative password, server roles, current user name, machine name, boot options, and more. Using WMI, you can deck out the walls of your cubicle with ample justification for upgrading the server farm. Figure A shows such SNMP-enabled charts. Similar monitoring capabilities are available for Linux. For instance, the free WebNMS product implements an SNMP agent but also offers management through HTTP.

Figure A

Click to enlarge.

Windows Graphs from SNMPBOY.MSFT.NET

Worth the effort and cost?

The hurried and harried network administrator would be right to question how much effort to put into studying SNMP and related topics. The Distributed Management Task Force (DMTF), primary custodian of knowledge about SNMP and related topics, insists that its tutorial documents are suitable for “management application developers, instrumentation developers, information technology managers and system administrators.” That may be over-reaching. Scott Neumann of the CIM Road Map Task Force describes CIM as “the most developed and widely accepted model for describing an electrical network.” That said, the subject is as deep and as complex as big or heterogeneous networks can get.

SNMP for software?

Here’s where the fun begins. SNMP agents are not only for physical gadgetry. For instance, Oracle Enterprise Manager (OEM) can be tweaked to respond to alerts from Oracle VM, Oracle Database, or Fusion Middleware. The use case Oracle offers is its Contact Center Anywhere (CCA) application. Oracle walks prospective SNMP users through useful telephony-related traps, but also straightforward problems such as software license failures, “Malicious Call Trace,” Automated Call Distribution Voice Mail, etc. These examples show how an application could be engineered to help managers understand its performance, to automatically escalate certain conditions, or to implement enterprise-specific workflow. These could result in exciting improvements in the way software is designed.

Be advised that there is risk in this Spy-vs-Spy world. SNMP’s original designers were a trusting lot, and security seemed to have taken a back seat to disclosure.  SNMP “community strings” function as passwords between the manager and the agent. The community string appears in every packet sent between them. Don’t risk having your SNMP agents become double agents. Don’t accept the default values of “public” or “private” for community strings. “Private” is especially problematic as it may permit an attacker to modify a device’s configuration. When it makes sense to do so, and when the device allows it, limit which IP’s are permitted to access SNMP agents. While not all network devices support it, SNMP Version 3 features improved agent encryption, which reduces the risk of man-in the-middle network attacks that could not only discover how to get out of your DMZ but could potentially reconfigure devices.

Buyer beware tips

A comprehensive buying guide is beyond the scope of this brief post, but here are a few important tips to get you started:

  • Keep in mind that even a small network will have hundreds, even thousands of “devices.” If pricing is based on a device count, round up. Way up.
  • “Automatic Network discovery” is great in principle, but it assumes everything is going your way - i.e., that both agent and manager can see one another.
  • There are hosted as well as internally managed solutions for SNMP monitoring.
  • Your time investment will add up. It won’t necessarily be all in one sitting. Prepare to invest serious time in making use of SNMP alerts. If you’re on a project schedule, the community / commercial options can be a way to get help without a big investment.

Recommended readings

  1. Whemsolutions.com tutorial on the DTMF Common Information Model.
  2. SNMP Penetration Testing Technical Note from SANS Institute.
  3. TCP/IP Guide’s entry on SNMP Version 3 (SNMPv3).
  4. Cisco’s Guide to SNMP.


by Mark Underwood at March 05, 2010 02:00 PM

Standalone Sysadmin

Flashback: Burnout and the toll it takes

You are probably a human. At least, the statistical odds are in your favor. As a human, you experience stress, and how you react to it plays a large part in determining how happy you are. System administrators deal with stress particularly poorly, in general. We assume the role of hero and that’s that. Do what it takes, bask in whatever glory accompanies the successful completion of our task.

There is no downtime in that equation. Immediately following those emergencies, most of us drink depressants to bring ourselves down. On normal days, we require morning stimulants to bring ourselves up. I highly suspect that some of us are so called “adrenaline junkies” from the relative high that we get when there’s an immediate problem that no one can solve but ourselves.

This is unhealthy.

What we really need is to be able to step back and look at the pattern in our lives and say I don’t want to live with this stress.

When it first hit me that stress is probably the biggest single microproblem for admins, I wrote the following. I hope you find it relevant.


Jack Hughes, over at the Tech Teapot, mentions a very appropriate subject for too many systems administrators: burnout.

As sysadmins, we’re nearly always the go-to person for whatever happens. After a while, we start to get used to it, and lots of times, we can develop a hero complex, carrying the weight of the world on our shoulders, at least in our minds. This isn’t healthy for a lot of reasons, the most important of which is your health.

Here’s an example of what taking your job too seriously can do to you:

Part One

Part Two

Not to ruin the ending, but the most disgusting part is that, while the guy was taking medical leave, his company fired him. To be completely honest, he’s much better off without a company like that, and if your company would do the same thing, then so are you.

To quote Peter Gibbons, “We don’t have a lot of time on this earth. We weren’t meant to spend it this way. Human beings were not meant to sit in little cubicles staring at computer screens all day…”

Even one of the most preeminent Systems Administrators around, Tom Limoncelli advocates leaving the pressure at work when you head home. For those of us on call 24/7/365, that can be a little hard, but it’s important to try.


by Matt Simmons at March 05, 2010 10:52 AM

A Year in the Life of a BSD Guru

Best of FreeBSD Basics on Kindle

The Best of FreeBSD Basics is now available for the Kindle.

March 05, 2010 09:26 AM

Chris Siebenmann

PCs are (or can be) Unix workstations

PCs are (or can be) Unix workstations

My entry about the end of Sun was posted on Hacker News and garnered a comment thread. In two comments that I'm condensing and excerpting here, HN user rbanffy wrote (in context):

There is a huge difference between a glorified PC and a Unix workstation.

[...]

Unix workstations were built to run Unix. A Mac pro is essentially a PC. [...]

The difference looks subtle now, when every desktop computer is essentially the same. For those who lived through this, like the writer of the original article, it was blatantly obvious.

Since my name is being invoked, I am going to speak up: I reject this view. A Unix workstation is no more and no less than a machine, dedicated to a single user, that runs Unix with a graphical environment. The idea that PCs cannot be Unix workstations is the same kind of elitism and mythology that people deride in Lisp fanatics, and it is clearly wrong. To argue otherwise is to use a very selective reading of the history of Unix workstations, one that ends the moment that Unix workstation companies started making products using PC components and PC companies started making 'Unix workstation' grade components usable on PCs.

It is also to use a selective reading of the history of the marketing of Unix workstations. Very few Unix workstations were sold as ultra high performance machines; most were sold as 'fast enough and cheap enough', and quite often this was not very fast and as cheap as possible to run Unix. In fact, at the height of the Unix workstation era you could routinely find workstations without floating point hardware.

Yes, the Unix workstations generally worked with less sweat and effort. This was for the same reason that Apple Macs just work, namely that the workstation vendor controlled both hardware and software and so could closely integrate them.

Yes, Unix workstations originally had better performance than PCs. This was because PC performance was terrible, in fact all performance was terrible, and people had to pay extra for the ability to run Unix at (marginally) acceptable speeds. Both parts changed over time; by the end of the era of dedicated Unix workstations, they were worse than PCs for the same (or more) cost (cf). One major reason that the march of the cheap slaughtered the dedicated Unix workstation vendors was that PCs got good enough to be good Unix workstations.

(Unix vendors could still build $10,000 machines that performed better, but it turned out that people by and large didn't need and weren't interested in that much performance; once they could get what they wanted and needed for less than $10,000, they stopped paying $10,000 for machines. Late-period Unix workstation vendor marketing tried desperately to persuade people that they really did need that performance, for obvious reasons.)

by cks at March 05, 2010 08:11 AM

March 04, 2010

the life of a sysadmin.

Postfix cleanup

Did you know that Postfix's cleanup daemon consolidates messages going to the same envelope recipient? I did not.

March 04, 2010 11:15 PM

cmdln.org

Installing NRPE on XenServer

I like to have as little run in dom0 as possible. However some things you really need checked from dom0, like the status of your raid perhaps. Just some quick instructions on getting Nagios NRPE running in XenServer. Install EPEL repository and disable it by default (remember we don’t want to accidentally install unnecessary packages) wget http://download.fedora.redhat.com/pub/epel/5/$(uname [...]

by Nick Anderson at March 04, 2010 06:05 PM

TaoSecurity

Bejtlich Teaching at Black Hat EU and USA 2010

Black Hat was kind enough to invite me back to teach multiple sessions of my 2-day course this year.

Next is Black Hat EU 2010 Training on 12-13 April 2010 at Hotel Rey Juan Carlos I in Barcelona, Spain. I will be teaching TCP/IP Weapons School 2.0.

Registration is now open. Black Hat has three price points and deadlines for registration remaining.

  • Regular ends 1 Apr

  • Late ends 11 Apr

  • Onsite starts at the conference


Finally we have Black Hat USA 2010 Training 0n 25-28 July 2010 at Caesars Palace in Las Vegas, NV. I will be teaching two sessions of TCP/IP Weapons School 2.0, one on the weekend and one during the week.

Registration is now open. Black Hat has set five price points and deadlines for registration.

  • Super Early ends 15 Mar

  • Early ends 1 May

  • Regular ends 1 Jul

  • Late ends 22 Jul

  • Onsite starts at the conference


Seats are filling -- it pays to register early!

If you review the Sample Lab I posted earlier this year, this class is all about developing an investigative mindset by hands-on analysis, using tools you can take back to your work. Furthermore, you can take the class materials back to work -- an 84 page investigation guide, a 25 page student workbook, and a 120 page teacher's guide, plus the DVD. I have been speaking with other trainers who are adopting this format after deciding they are also tired of the PowerPoint slide parade.

Feedback from my 2009 sessions was great. Two examples:

"Truly awesome -- Richard's class was packed full of content and presented in an understandable manner." (Comment from student, 28 Jul 09)

"In six years of attending Black Hat (seven courses taken) Richard was the best instructor." (Comment from student, 28 Jul 09)

If you've attended a TCP/IP Weapons School class before 2009, you are most welcome in the new one. Unless you attended my Black Hat training in 2009, you will not see any repeat material whatsoever in TWS2. Older TWS classes covered network traffic and attacks at various levels of the OSI model. TWS2 is more like a forensics class, with network, log, and related evidence.

I plan to retire TWS2 after Vegas this year and teach TWS3 in 2011, if Black Hat invites me back.

I recently described differences between my class and SANS if that is a concern.

I look forward to seeing you. Thank you.

by Richard Bejtlich (noreply@blogger.com) at March 04, 2010 05:49 PM

Bejtlich to Speak at FIRST 2010

I'm happy to report that I will present Building a Fortune 5 CIRT Under Fire at FIRST 2010 on 16 Jun 10 in Miami, FL. I plan to attend the majority of the conference, since it is one of the few focused on incident detection and response. I hope to see you there!

by Richard Bejtlich (noreply@blogger.com) at March 04, 2010 05:16 PM

SysAdmin1138

3rd party application headaches

A while back we managed to push through some new purchasing rules that required IT review of any IT technology purchases. This is needed, since end-user departments haven't the first clue what'll work with our existing infrastructure, and it helps us advise them of complications. For instance, if a product requires PHP on IIS for some reason, we really want to be able to let them know before they purchase that doing so will require a server purchase as well since we don't support that environment currently.

Unfortunately, a small number of things still slip through. Perhaps we didn't read the manuals enough. Perhaps a high enough manager expended sufficient political capital to Make It So. But complications can arise when we go to make the new thingy work.

A case in point:

For the last two weeks I've been attempting to get a certain package up and running that has email capabilities. This has to fit within our Exchange system, which is a rather common environment. What isn't so common, it seems, is our insistence on secure protocols for authentication. While Exchange 2007 is perfectly willing to support naked POP3 and even naked SMTP-Auth, we, on the other hand, are not so forgiving. We wisely have a security standard in place that says that all authentication traffic must be encrypted, and this prevents us from running POP3 and SMTP in a way that allows passwords in the clear.

This package has support for one SSLed service: POP3-SSL. We don't support POP3 since our users were forever screwing themselves thanks to the default of "Delete on retrieval" in most mailer clients, which kind of pissed them off when they got to the office the next morning and their mailbox was empty.

Thanks to the use of stunnel I was able to tunnel unencrypted IMAP to Exchange's IMAP-SSL port at least, so that channel got working.

Right now I'm trying to convince stunnel and the application to work together to get SMTP-TLS working. Sadly for me, I have to wait a couple of hours before the app attempts an SMTP check for me to see if it works.

On the 'up' side, we're charging this department by the hour to get this set up. So the labor bill on this will be fairly high.

by SysAdmin1138 at March 04, 2010 05:14 PM

High Scalability

How MySpace Tested Their Live Site with 1 Million Concurrent Users

This is a guest post by Dan Bartow, VP of SOASTA, talking about how they pelted MySpace with 1 million concurrent users using 800 EC2 instances. I thought this was an interesting story because: that's a lot of users, it takes big cajones to test your live site like that, and not everything worked out quite as expected. I'd like to thank Dan for taking the time to write and share this article.

In December of 2009 MySpace launched a new wave of streaming music video offerings in New Zealand, building on the previous success of MySpace music.  These new features included the ability to watch music videos, search for artist’s videos, create lists of favorites, and more. The anticipated load increase from a feature like this on a popular site like MySpace is huge, and they wanted to test these features before making them live. 

If you manage the infrastructure that sits behind a high traffic application you don’t want any surprises.  You want to understand your breaking points, define your capacity thresholds, and know how to react when those thresholds are exceeded.  Testing the production infrastructure with actual anticipated load levels is the only way to understand how things will behave when peak traffic arrives. 

For MySpace, the goal was to test an additional 1 million concurrent users on their live site stressing the new video features.  The key word here is ‘concurrent’.  Not over the course of an hour or day… 1 million users concurrently active on the site. It should be noted that 1 million virtual users are only a portion of what MySpace typically has on the site during its peaks.  They wanted to supplement the live traffic with test traffic to get an idea of the overall performance impact of the new launch on the entire infrastructure.  This requires a massive amount of load generation capability, which is where cloud computing comes into play. To do this testing, MySpace worked with SOASTA to use the cloud as a load generation platform. 

Here are the details of the load that was generated during testing.

by Todd Hoff at March 04, 2010 03:50 PM