by Anton Chuvakin (anton@chuvakin.org) at September 02, 2010 06:11 PM
Amazon.com just posted my five star review of Hacking Exposed: Wireless, 2nd Ed by Johnny Cache, Joshua Wright and Vincent Liu. From the review:by Richard Bejtlich (noreply@blogger.com) at September 02, 2010 05:28 PM
Quantum cryptography sounds like science fiction, but the people at Quantum Hacking have already cracked a pair of commercial quantum crypto implementations.
Quantum key exchange is generally regarded as a perfect solution to the problem of securely exchanging cryptographic keys. Ensuring that keys are exchanged in a secure manner is a critical part of the process of communicating secret information without any chance of an eavesdropper acquiring the secrets.
One of the foundational principles of modern cryptology theory is Kerckhoffs’ Principle, which states that a cryptosystem should be secure even if everything about the system is known by potential attackers except its key. Shannon’s Maxim, a roughly equivalent but much more succinct formulation, says “The enemy knows the system.”
With such principles of security firmly in mind, two obviously necessary policies for secure systems arise:
There are two generally accepted ways to solve the problem of key exclusivity:
The problem of key exchange has been the subject of much debate, research, and effort over many years. Public key cryptography essentially avoids the whole issue by using key pairs, where the system is not only not compromised if half the keyset falls into the wrong hands, but it works better if that half of the keyset is public knowledge. For certain purposes, however, symmetric key encryption is often preferable, so long as the key exchange problem is solved.
Heisenberg’s uncertainty principle has plagued physicists for decades. When what you study gets small enough, the tiny little particles — photons, the building blocks of things like “light”, for instance — that you need to bounce off of what you are trying to observe are no longer inconsequential to the target of observation. When you want to observe the behavior of an electron, trying to bounce a photon off the electron can alter the state of the electron, leaving you with an uncertain read on the particle’s state. It gets worse as the targets of your observation get even smaller, as in the case of trying to observe photons themselves.
Quantum key exchange makes clever use of this uncertainty principle. Systems that make use of quantum key exchange take advantage of the uncertainty principle to guarantee, at least in theory, that nobody has attempted to observe the key in transit. Any attempt to do so will change the state of the communication, thus producing detectable anomalies, alerting the communicating parties to the presence of an eavesdropper so they will know the key has been compromised and will not use it.
In theory, it seems like an infallible system. In practice, the actual security of the system is subject to the limitations of implementation — the one weakness that plagues all cryptosystems. Commercial quantum key distribution systems exist, but the technology is still not a 100% perfectly solved problem.
Such systems typically use a fiber optic cable to communicate data across distances measured in kilometers, employing avalanche photodiodes to detect individual photons. The work of a group of quantum information scientists at the Norwegian University of Science and Technology, known as the Quantum Hacking group, in collaboration with the Max Planck institute for the science of light and the University of Erlangen-Nürnberg, has produced a means to crack two quantum key distribution systems by exploiting a characteristic of the design of avalanche photodiodes.
In simplified form, the crack consisted of a man in the middle attack that works by fooling the photodiode itself, using nothing but off-the-shelf (if somewhat expensive for casual use) components. “Blinding” the receiving system’s photodiodes with a laser so that it cannot read the quantum states of incoming photons causes the diodes to behave as a “classical detector”, recording bit values not due to the quantum states of incoming photons but due to the detection of pulses of brighter light. As such, the eavesdropper can “blind” the intended recipient, receive the key in its stead, then convey the key’s value to the still-blinded avalanche photodiodes in the intended recipient system by way of pulses of bright light.
One of the researchers, Vadim Makarov, said of the crack, “We have exploited a purely technological loophole that turns a quantum cryptographic system into a classical system, without anyone noticing.” (In the picture above, a member of the Quantum Hacking team, Lars Lydersen, tests Clavis2 quantum cryptography system for detector controllability. Photo credit: 2009 Vadim Makarov, www.vad1.com)
Since discovering the vulnerability, the researchers have worked in collaboration with ID Quantique, the vendor for one of the commercial systems, to fix the weakness in this type of quantum key distribution system. The Quantum Hacking group’s paper was published in the Nature Photonics journal, as Hacking commercial quantum cryptography systems by tailored bright illumination.
Photos of the equipment used to analyze the cracked systems and perform the crack are available at the Quantum Hacking site, in Cracking commercial quantum cryptography: how we did it, in pictures. They even provide a link to a photo of ID Quantique engineers feeding pizza to Quantum Hacking researchers as they worked on a fix for the vulnerability.
For the past few days, our NYC office has had incredibly irritating problems with the internet connection. We’ve got service through a local Metro-E provider, but they’re a CLEC, which means they don’t own the lines, they just lease them from the ILEC, who is in this case, Verizon.
The root of the issue is that the wiring at the building we’re in is crap. It’s a small 5 story building that used to be apartments and has been converted to offices, and the wiring is just not up for the job. We went through several pairs of copper pairs looking for one that was good enough to carry the metro-E signal, and it was all we could do. Before metro-E, we had DSL, where we capped out at just over 1Mb/s…and this is in Manhattan.
Unfortunately, the circuit is currently in the middle of dying, so it’s working sometimes and failing others. I first opened this ticket on Monday, and have exchanged emails with our provider a dozen times or so. They’ll see the issue, but symptoms are vague as to whether it’s their equipment, our equipment, or the line running between our equipment, or (what I’m fairly sure the problem is), the lines entering the building from Verizon.
It wasn’t until last night when they finally saw enough errors on the bridge to have Verizon to commit to a service call tomorrow evening to add a loop. Every other time, everything on the line was hunky-dory. This is why intermittent problems take so long to solve…because all the stake holders have to be monitoring at exactly the right time for anything to get done.
Meanwhile, I’ve been having to apologize to my users, and give them instructions on how to forward their desk phones to their cells.
Even though the problem isn’t actually with my provider, I would love to get a secondary network connection, because the lines here are just too unreliable. No cable companies will give us service, no fiber companies will touch the building…it’s pretty much just Verizon and their CLECs at this point.
I think we’ve only got 2 more years on the lease?
Consistent Hashing is a specific implementation of hashing that is well suited for many of today’s web-scale load balancing problems. Specifically, it can be seen in use in various caching solutions like Memcached and is applicable to NoSQL solutions as well. Consistent Hashing is used particularly because it provides a solution for the typical “hashcode mod n” method of distributing keys across a series of servers. It does this by allowing servers to be added or removed without significantly upsetting the distribution of keys, nor does it require that all keys be rehashed to accommodate the change in the number of servers.
You can read the full store here.
Tonight, we rebooted a machine that hung (presumably due to OOM or other funkiness) and it came back in the bios saying:
Foreign configuration(s) found on adapterOur managed hosting support weren't sure what to make of this, so we decided to make a new home (from backups) for the services on this now-dead machine. Dell won't helping debug on this until tomorrow.
This is one of many total data losses I've observed on RAID sets in recent months - all due to RAID failures. Thankfully, We have backups that get shipped to HDFS. We monitor those backups. We also have puppet and other automation to help move and rebuild services on a new host. We're equipped to handle this kind of failure.
This leads me to a new conclusion: The 'R' in RAID is a lie. It is not redundant. Treating it that way can lead you to the raid-is-backup fallacy.
Wikipedia has this to say about Redundancy (engineering): "In engineering, redundancy is the duplication of critical components of a system with the intention of increasing reliability of the system, usually in the case of a backup or fail-safe."
Adding more parts (complexity) to a system doesn't often increase its reliability. Even taking into account the disk redundancy you might get with mirror or parity, you're still hedging that the RAID card doesn't die, which it will. Everything's MTBF comes eventually, so weigh your risk.
Back to my conclusion that RAID is not redundant. RAID is not dead, I'm just done viewing RAID as a continuity-through-drive-failure technology. RAID has other benefits, though. It achieves more than just redundancy (when your card doesn't die).
RAID makes multiple drives present as a single drive device to the OS, right? Right. RAID allows you to aggregate disk IO performance to achive higher read/write rates than with a single disk alone. You can also aggregate disk space this way, too, if you didn't know.
It's almost 0100 now, I'd much rather be sleeping or playing TF2 than helping
rebuild from backups.
![]()
While doing some testing with Cisco Access Registrar (CAR) 5.0, I noticed that the test user had hundreds of old sessions hanging around:
--> query-sessions /r with-User X
Sessions with-User X for /Radius
Sessions for /Radius/SessionManagers/session-mgr-1:
S432 Key: 04000000000CF660, NAS: NAS.slaptijack.com, NAS-Port: 0, User-Name: X, Time: 21:19:26, Acct-Session-Id: 04000000000CF660
<snipped a lot of lines>
--> count-sessions /r with-User X
Total 469 session(s) with-User X for /Radius/SessionManagers
It seemed like a good idea to have these old sessions be cleared out rather than sitting around forever. These sessions were likely created during testing when things may not have worked perfectly right. Although this can be done by hand (release-sessions), I decided to have it done automatically. You have to change two settings.
First, set the PhantomSessionTimeOut in your session manager. This controls how long it takes for an accounting start record to appear before CAR believes the session is never going to start.
--> cd /Radius/SessionManagers/session-mgr-1/
[ //localhost/Radius/SessionManagers/session-mgr-1 ]
Name = session-mgr-1
Description =
Type = Local
IncomingScript =
OutgoingScript =
AllowAccountingStartToCreateSession = False
SessionTimeOut =
PhantomSessionTimeOut =
SessionKey = User-Name
ResourceManagers/
--> set PhantomSessionTimeOut "10 Minutes"
Set PhantomSessionTimeOut "10 Minutes"
--> cd /Radius/Advanced/
[ //localhost/Radius/Advanced ]
<snip>
SessionPurgeInterval =
<snip>
--> set SessionPurgeInterval "1 Hour"
Set SessionPurgeInterval "1 Hour"
--> save
Validating //localhost...
Saving //localhost...
--> reload
Reloading Server 'Radius'...
Server 'Radius' is Running, its health is 10 out of 10
As you remember, we had 469 sessions for user X. After an hour…
--> count-sessions /r with-User X
No active sessions found with-User X for /Radius/SessionManagers
global is necessaryglobal is necessaryWhen I started out programming in Python, I didn't really like
global. For a long time I considered it unaesthetic, annoying,
and on the whole an irritating wart of the bytecode implementation. As I mentioned recently,
I have come around to a different view of global, and it goes like
this.
If you want to have both global variables and lexically scoped local variables, you have to be able to tell whether a given name being assigned to in a function is a local or a global variable at the time that the function is being defined. Assuming that you want as much as possible of this to be implicit for various reasons, there are three relatively reasonable choices that I can think of:
(If a name is never assigned to within a function but only read from, it's either a global variable or a 'use of an undefined value' error. Python opts to consider it a global variable.)
The third option is fragile (and un-Pythonic). This leaves you with a
choice between the first and the second options, and either way you are
going to need a keyword for it. Python makes the decision that writing
to global variables will be rare and so it forces you to declare them
explicitly; local variables, the common case, are handled implicitly. So
it needs global, because having local instead would be worse (and
having neither would be much worse).
(This decision might be either a pragmatic one, based on what was expected to be common, or a philosophical choice to make global variables more inconvenient in the hopes of making them less common. I don't know the Python history involved, so I have no idea which it was.)
Other languages make different choices here, sometimes for philosophical reasons that come down on the other side and sometimes just for historical ones (eg, if they started out without local variables or lexical scoping at all).
The core problem with the fully implicit option, why it is fragile in many ways, is that it makes the meaning of a function dependent on its surrounding context. You can't just read a function and know what it does and what it manipulates; instead you have to know what global names exist when the function is defined.
One consequence of this is that anything that changes what global names are defined can change the meaning of the function. In a language like Python where function definition is an ordinary executable statement, one done immediately when encountered, merely moving a function definition forward or backwards inside a file could change the function's meaning even without any other code changes (as you move it before or after where global names are created or even deleted).
In this post I'll cover the difference between multi-core concurrency that is often referred to as Scale-Up and distributed computing that is often referred to as Scale-Out mode.
more..
Project Honeynet just released its latest Forensic Challenge 5 - Log Mysteries. It is based on logs from a compromised virtual server and requires quite a bit of digging through messy log data.
The Challenge:
Analyze the attached sanitized_log.zip [A.C. – get the logs here] and answer the following questions:
- Was the system compromised and when? How do you know that for sure? (5pts)
- If the was compromised, what was the method used? (5pts)
- Can you locate how many attackers failed? If some succeeded, how many were they? How many stopped attacking after the first success? (5pts)
- What happened after the brute force attack? (5pts)
- Locate the authentication logs, was a bruteforce attack performed? if yes how many? (5pts)
- What is the timeline of significant events? How certain are you of the timing? (5pts)
- Anything else that looks suspicious in the logs? Any misconfigurations? Other issues? (5pts)
- Was an automatic tool used to perform the attack? if yes which one? (5pts)
- What can you say about the attacker's goals and methods? (5pts)
Bonus. What would you have done to avoid this attack? (5pts)
Go get the challenge here and get to solving it – you have about a month. And, yes, there will be prizes too!
Finally, if you really want to make me happy (hehe...who’d want that? :-)), please invent a new approach while solving the challenge.
Possibly related posts:
by anton@chuvakin.org (Anton Chuvakin) at September 01, 2010 06:22 PM
Can you have your ACID cake and eat your distributed database too? Yes explains Daniel Abadi, Assistant Professor of Computer Science at Yale University, in an epic post, The problems with ACID, and how to fix them without going NoSQL, coauthored with Alexander Thomson, on their paper The Case for Determinism in Database Systems. We've already seen VoltDB offer the best of both worlds, this sounds like a completely different approach.
The solution, they propose, is:
There’s an amusing thread on the LOPSA Discuss list going on right now. It’s called “What Animal is a System Administrator“.
I was leaning toward the beaver until I saw the post by Paul Graydon, who recommends the Pooka, aka the Púca:
The púca has the power of human speech, and has been known to give
advice and lead people away from harm. Though the púca enjoys
confusing and often terrifying humans, it is considered to be
benevolent.
It’s like I’m looking in a mirror.
Well it is the first of the month and it seems like I have internet access still. That's good news.
Lets see what happens my DHCP lease expires. That's the real test.
I don't want to push my luck, but it looks like good news so far!
The other day, I caught a message that KSplice was available for Fedora. I thought I’d be a wiseguy and I replied “Yeah, great. Call me in 20 years when it’s available for for RHEL”. Well, as several people pointed out, it turns out the joke is on me.
As you can see, it’s actually available for many Linux-based OSes at various prices. I suppose my confusion stemmed from the fact that I misunderstood what ksplice was.
My impression from a long time ago, when it first came out on Ubuntu, was that it was essentially a kernel patch that dynamically loaded patches and provided the ability to rebootstrap a kernel that was already loaded. As it turns out, it’s a commercial product that offers the ability to not have to reboot your machine to update the kernel. Let me be frank: I’m all about that.
The part that I kind of object to is in the press release, of all things. It’s the opening line of the company profile:
Ksplice is an enterprise software company making reboots a thing of the past.
Please, lets be honest. Reboots are inevitable. Using this product as a stop-gap for untimely reboots may be handy (at the low low price of $50 per year per server), but it can’t (and shouldn’t!) replace regular reboots.
The reasons for scheduled rebooting of machines are numerous. The primary one is that regular reboots assure that the machine is configured to boot correctly. If you’ve got a machine that’s got over 100 days of uptime, how do you know it will start correctly? You last booted it last quarter…what has happened to that machine since then? Changes in installed services, mountpoints, etc…it’s hard to tell if it’s going to be in a known-good state when it comes back up after a power failure.
Another reason to reboot occasionally is to clean up the running state of the machine. What’s that you say? Your machine is running fine? Well, sure, it may be, but how much cruft is left hanging that isn’t obvious? Have you ever used kill -9? Do you know for sure that there aren’t any memory leaks in your running services? Any processes hang while reading I/O and is now stuck in uninterruptible sleep?
Yes, there are lots of things that happen to servers over the course of doing their jobs. A reboot fixes many of them. The only argument against it is uptime.
I’ve written about uptime before, and I still feel the same way. Modern system administration has advanced beyond a single server providing a service. Uptime needs to be measured from the outside in, and according to the availability of the service, not the individual servers comprising that pool.
Feel free to disagree. Let me know if you’ve got an uptime of a year plus and you’re proud of it, or if you would be ashamed to be in that position.
Edit
This entry is causing quite a stir on Reddit. Cxunix from twitter also weighed in on his blog, servermanaged.it (link is in Italian, English translation here).
As promised, here is another detailed SIEM whitepaper called “A Pragmatic Approach to SIEM: Buy for Compliance, Use for Security” that I wrote for a great team at Tripwire earlier this year.
“While recent economic troubles might have something to do with it, many organizations today seek to only do a bare minimum of security. To be more precise, they try to do what they think is the bare necessary minimum. Their perception that security “due diligence” can be reduced all the way down to the level prescribed by regulations, such as PCI DSS, is more common than ever today. All too common result of this thinking is security breaches and other damaging events.
This trend has affected many security safeguards, and SIEM and log management are hard hit by this as well. It is very common to deploy these technologies in order to satisfy the compliance check box. In this paper we will analyze this trend and provide useful guidance for getting value out of SIEM and log management tools while focusing on protecting systems and data – and not simply on checking the box.”
Get the paper here.
Possible related posts:
by anton@chuvakin.org (Anton Chuvakin) at September 01, 2010 06:11 AM
>>> print "".join(" hello world ".split())
helloworld
The key to the above is that the String method split() separates the string on any amount of whitespace when no separator is specified.

I’ve just released a simple chef cookbook that will install nodejs from source. You can check it out directly from github or download it from the opscode cookbook site. Let me know what you think if you find it useful.
net.ipv4.conf.*.rp_filter can worknet.ipv4.conf.*.rp_filter can workFirst, the background. net.ipv4.conf.*.rp_filter controls some IP address source validation filtering done on incoming IPv4 packets. It has three values:
| 0 | No filtering is done. |
| 1 | Packets are discarded if they come in on any interface except the one that a reply to the source IP would go out on. |
| 2 | Packets are discarded if a reply to the source IP could not be sent out any interface. |
(A more formal description is in ip-sysctl.txt in the kernel documentation. Like all interface sysctls, it can be set separately for each interface, as a default, and for all interfaces.)
I don't understand how this can possibly work. Well, I understand how it works, I just don't understand how it can possibly do any good in most configurations. And I don't understand how a setting of '1' can possibly work at all in multihomed configurations where the multihomed machine is not the sole router for every network it's connected to that is not where its default route points.
First, as far as I can tell a setting of '2' is equivalent to '0' if you have a default route set (the usual case). With a default route set, all source IPs are reachable and so '2' will never discard packets, which is exactly the same as '0'.
For a machine with a single network interface and a default route, all settings are equivalent (for the same reason as above; all source IPs are reachable through your single interface). If you do not have a default route, either '1' or '2' will discard packets that come from networks you do not have routes for.
It is the multihomed case where things explode. Suppose that you have
a multihomed host with two network interfaces, net-1 and net-2, with
IP-1 on net-1 and IP-2 on net-2. With an rp_filter value of 1, a
machine on net-2 cannot talk to this machine's IP-1 address unless the
packets pass through the multihomed machine on the way to net-1, ie the
multihomed machine is the router for the net-2 machine. If the packets
go through another router, they will arrive on the multihomed machine's
net-1 interface but the replies would go out the net-2 interface, so
they fail the check.
Effectively this creates a bad version of an isolated interface, with the packet reachability restrictions but without the multiple split routing tables that make multihomed hosts actually work. As a bonus it hides the restriction deep in the networking sysctls, where you have to be an expert to find it.
(I suppose that there are some advantages to this half-hearted approach, in that it avoids some limits in the policy based routing version of it.)
By the way, I stumbled over this courtesy of Ubuntu 10.04 setting
rp_filter to 1 by default. We have multihomed non-routing machines,
and when we set up an Ubuntu 10.04 test version things promptly
exploded. If I was not already suspicious of network sysctls, we could
have spent quite a lot of time trying to find out just why the machine
was ignoring certain sorts of network traffic.
(As it was I did 'sysctl -a | fgrep net. | sort' on both a 10.04
and an 8.04 machine and then looked for settings that were different.
Ubuntu 10.04 may not be the first version that sets this, but 8.04
definitely didn't.)
PS: a much more useful version of this sysctl would be a 'private' flag on interfaces. If an interface had the private flag set, packets with a source IP address that was routed through that interface would only be accepted on that interface; all other interfaces would discard such packets.
Interesting technology-related news from around the web for 2010-08-31:
Network monitors are a dime a dozen. You can find a network monitor to fit just about every need and every taste. Because of the abundance of monitors available, it’s a real needle-in-a-haystack adventure to find the one that fits your bill. And since not all of these tools are free — unless the tool you’re looking at has a demo — you could be out some cash until you find the right one.
That’s why when you find a tool that has many of the features you need at a cost that is appealing to your budget, it’s time to install it and use it. One such monitor is Zenmap. Zenmap is the official cross-platform, GUI front-end for the Nmap security scanner. But does Zenmap fit your needs? Is it the perfect tool at the perfect price? Let’s dig in and find out.
Zenmap is for any network or security administrator who needs to keep a constant check on their network topology. With it’s next-to-zero learning curve, just about any network administrator can have all of the information they need quickly. Zenmap will work for any size company or even a single-user consultancy, where a quick scan of a network topology can make the difference between spotting a security issue and finding a resolution or, well…not.
There are two very key issues Zenmap solves. One is making the more-challenging Nmap scanner useable for the average administrator. Nmap is a console-only tool and the majority of administrators do not want to spend their day at the console (with a nod to the old-school Linux and UNIX admins who would much rather spend their day at the command line than in a GUI tool). Zenmap also gives the administrator a topology mapping tool where they can actually see an interactive, animated visualization of the hosts on your network.

The interactive Topology mapping allows you to add/remove hosts/features, drag and resize the map, zoom in and out of the map, and much more.
There is very little wrong with Zenmap. But if I were to really dig deep, I would have to say the interactive Topology Map takes a bit of trial and error to get used to. And the lack of any discernible legend for colors or symbols makes it necessary to consult documentation to help read the topology map.
With Nmap being one of the standards by which other scanners are judged, having an easy-to-use GUI front end for this tool makes perfect sense for any network administrator. If you are looking for an user-friendly, flexible network scanner and do not want to spend any of your precious IT budget on said scanner, Zenmap is the tool for you.
Competitive products
Have you taken advantage of the power of Zenmap? If so, what was your experience? Would you recommend this network security scanning solution to your fellow administrators? Share your experience/thoughts with your fellow TechRepublic readers.
Debates sometimes arise, both within academic circles and outside of them, over the necessity of high-intensity secure deletion techniques. Find out the true state of affairs for secure data disposal.
The state of the art of secure data disposal is, like that in most technical spheres of knowledge, always subject to change as researchers do their work. One might imagine that this involves new techniques for more effective data recovery that employs magnetic force microscopes and similarly high-cost solutions, countered by new advice for how to defeat such efforts when disposing of hard drives and other storage media.
One example of an impressive data recovery effort is that of the remains of hard drives from the Columbia space shuttle disaster, which ultimately led to the recovery of experimental data. Six months after the shuttle came apart on atmospheric reentry, a damaged hard drive was found in a dry lakebed and delivered to data recovery specialists at Kroll Ontrack Inc. Some time in the next four years or so, 99% of the data stored on the drive was recovered. The drive was eight years old before the shuttle disaster; it was delivered to the people who recovered the data from it looking like a melted down piece of slag and then damaged further during the recovery process — but recovery was a success.
On the other hand, two other drives involved in the shuttle disaster were complete losses.
There is a persistent myth to the effect that to securely delete everything from a hard drive one must overwrite it thirty-five times with random data. This myth arises from a superficial read and misunderstanding of Peter Gutmann’s 1996 paper, Secure Deletion of Data from Magnetic and Solid-State Memory. The truth of the matter, as presented in his paper, is that 35 random overwrites serves only to apply the necessary means of securely deleting data for any of several different drive technologies. A specific data storage technology only requires some lesser technique applied to ensure secure deletion.
Perhaps more interesting is the fact that, for the most modern hard drive technologies, a single complete overwrite of a drive with zeros should be sufficient. Part of the reason for this is the fact that data density on a drive is much greater than it used to be. In layman’s terms, “the bits are smaller”, which means that when rewriting, there is less room for old data to be left behind in a recoverable manner. A fair amount of redundancy of stored data occurred on older, lower density drives because the reading and writing devices were not as precise, and small deviations would leave random small areas unaffected on a single overwrite.
In a recent epilogue to his paper, Gutmann quoted himself responding to a researcher who considered doing some data testing:
Any modern drive will most likely be a hopeless task, what with ultra-high densities and use of perpendicular recording I don’t see how MFM would even get a usable image, and then the use of EPRML will mean that even if you could magically transfer some sort of image into a file, the ability to decode that to recover the original data would be quite challenging. OTOH if you’re going to use the mid-90s technology that I talked about, low-density MFM or (1,7) RLL, you could do it with the right equipment, but why bother? Others have already done it, and even if you reproduced it, you’d just have done something with technology that hasn’t been used for ten years. This is why I’ve never updated my paper (I’ve had a number of requests), there doesn’t seem to be much more to be said about the topic.
Recent papers by other researchers may seem to contradict Gutmann’s results. He does address some of this in his epilogues. Judging by both his epilogues and an independent look at reporting on such papers, it seems that such papers are in some cases misguided, and in others not contradictory of Gutmann’s results so much as relating to a specific technology that falls within the range of Gutmann’s more general overview.
While no single storage technology requires Gutmann’s described technique for dealing with all technologies, few of us have the time or inclination to double-check the specific technologies and the approaches required for each of them before tackling the task of secure data disposal. If you want to run a secure data disposal service where you expect to need to deal with many, many different storage devices regularly, it pays to know the specific techniques for specific technologies, and to apply them, if only because the time and resource costs for secure deletion will add up quickly. If you are a more typical user who just needs to get rid of a hard drive every couple years or so, the time spent keeping track of drive technologies and data disposal techniques is probably worth more to you than the time it takes a computer to perform Gutmann’s thirty-five overwrite “scorched earth” technique.
Some incredibly effective data recovery techniques may yet be developed that require new secure disposal techniques, in the future. Hopefully a diligent “scorched earth” approach today will defend effectively against such approaches tomorrow, but only time will tell. Meanwhile, given today’s technologies, Gutmann’s advice for data disposal still seems to be appropriate and well considered:
There are two ways that you can delete data from magnetic media, using software or by physically destroying the media. For the software-only option, to delete individual files under Windows, I use Eraser and under Linux, I use shred, which is included in the GNU coreutils and is therefore in pretty much every Linux distro. To erase entire drives I use DBAN, which allows you to create a bootable CD/DVD running a stripped-down Linux kernel from which you can erase pretty much any media. All of these applications are free and open-source/GPLed, there’s no need to pay for commercial equivalents when you’ve got these available, and they’re as good as or better than many commercial apps that I’ve seen.
For the physical-destruction option there’s only one product available (unless you want to spend a fortune on something like a hammer mill), but fortunately it’s both well-designed and inexpensive. DiskStroyer is a set of hardware tools that lets you both magnetically and physically destroy data on hard drives, leaving behind nothing more than polished metal platters. It’s been carefully thought out and put together, there’s everything you need included, down to safety glasses for when you’re disassembling the drive. It’s had very positive reviews from its users. If you really want to make sure that your data’s gone, this one gets my thumbs-up (and this isn’t a paid endorsement, if only other technical products had this level of thought put into the workflow and usability aspects).
Given recent concerns over the possibility of electronic devices carrying spying technology, though, it might behoove you to destroy drive electronics regardless of how you erase the drive, even if you do not go so far as to melt down drive platters. It all depends on how paranoid you want to get.
It probably wasn’t how Google’s CEO-founder Eric Schmidt (of “Don’t Be Evil” fame) envisioned things. Earlier this month protesters converged on the Google campus to protest the Google-Verizon joint proposal to keep the internet neutral. Called a “joint policy proposal for an open internet,” it was innocuous-sounding enough, but to many it is being seen, above all, as a sellout of the wireless internet where Google itself is keen to play. There’s some thoughtful consideration given in the proposal to treating all content equally — but only where “wireline networks” are concerned. There is also a call for “network transparency,” but since decades into internet build-out there is still little network transparency, this seems more like a wish than a policy suggestion.
The issue of net neutrality is awash with a jumble of technology, politics, and business. Way back in 1978 Rob Kling wrote in Telecommunications Policy:
“Proposals which focus on changing the kind or quality of data available to public policy makers assume that ‘rationality’ is inherent in the data or techniques used to generate it. Yet the evidence seems to indicate that whatever ‘rationality’ may be found in policy-making is as much a feature of the policy-making process as of the data that informs it.”
This seems to be true of the latest Google-Verizon proposal. Its pronouncements are assumed to be self-evident. Little glimpse is offered of the mountain of information the two firms together could marshall to strengthen their arguments. Like many aspects of the intersection between technology and public policy, extended discourse about complex proposals is readily sideswiped by vested interests, political calculations, and guesses hazarded about future technologies.
NPR’s Tom Cole sees several flaws in the proposal, and his views are typical. The FCC is to have no real enforcement power, instead relying upon a yet-to-be-identified advisory group operating through a “complaint-driven,” case-by-case oversight. On the other hand, the new Wild West is wireless broadband, whose providers are free to create “additional, differentiated online services” within their monopolies.
Cable TV was once one of those “differentiated services.” Seen from one perspective, it grew and made possible the wired Internet speeds many now enjoy. From a different perspective, it created a tangled bundle of services including “free” and “pay” TV, local and long distance voice, and broadband - with typically only one or two providers in a market. The palette of services is deep compared to years past, but those who want to limit costs may have a hard time untangling the service bundles. For example, in the case of Verizon, bundles offering wireless, broadband and TV services are least expensive when purchased together. Further, the bundled services are each delivered on a single fiber infrastructure — no picking and choosing infrastructure, so make room for that now-mandatory battery backup.
The Electronic Frontier Foundation’s Cindy Cohn applauds the proposal for avoiding direct FCC control of content, and considers the role of outside standards bodies potentially helpful, though not without risk. (Some such bodies, e.g., IEEE standards committees requiring 75% concurrence and debates lasting years, can be closed to public scrutiny; consider the case of Ultra-wideband below.) But she worries that the proposal sidesteps already exposed issues of censorship, content control, and adds her voice to the outcry over ceding the wireless Internet to purely mercantile interests — the portion “currently most lacking in openness and neutrality.”
Despite the lofty goals of net neutrality, as Harish Vadad points out, “not all packets are created equal and not all applications will get the priority”. QoS requirements for Web content and email are different from voice and video. Traffic shaping with QoS mechanisms already come into play, as well as protocol differences (e.g., TCP vs. UDP). Service-Level failures can occur through intrinsic factors, not just the oft-mentioned BitTorrent “abuse,” such as broadband HDTV users may experience around 10 p.m. on weeknights. My contract with Verizon gives me only an “up to” guarantee, and for downloads the service often exceeds that, but uploads are another story (see Figure A).
It doesn’t take much imagination to see why Google would be concerned with QoS issues. Now that Google Voice has morphed into a long distance voice communications provider , one can imagine why peering and generally playing nicely with Big Telecom might make sense to Google. Verizon, a firm that in my geographical area, based solely on my own experience, is executing well with its fiber infrastructure, may want to head off a capabilities end run by the pure Internet players.
Anyone assuming that net neutrality is a purely technical discussion should have their connectors cleaned. An analysis by the Sunlight Foundation recounts that just as there was Congressional opposition to plans as disruptive as a la carte cable service (imagine paying only for desired channels), there was opposition to net neutrality. No fewer than 74 Democratic and 171 Republican members of the House wrote the FCC in separate letters. Their gentle reminder? That Congressional direction is required before acting on net neutrality. Only last June, two of the five members of the FCC voted against a public hearing on overhauling the nation’s broadband regulations and addressing net neutrality. The political rationale varies. Some claim that the FCC doesn’t have the authority to reclassify broadband to Title II Telecommunication services, or that it would be challenged in court. Skeptics have a more cynical interpretation. In 2006, the last time Congress considered telecommunications legislation, the industry poured $59 billion into lobbying (source: Wall Street Journal via CNET).
On the other hand, a smaller voice consisting of four members from the House Energy and Commerce Subcommittee wrote the FCC chairman to critique the Google-Verizon proposal, identifying the segregation of wireless and overly broad description of “managed services.” In their letter they wrote, “Rather than expansion upon a proposal by two large communications companies with a vested financial interest in the outcome, formal FCC action is needed. The public interest is served by a free and open Internet that continues to be an indispensable platform for innovation, investment, entrepreneurship and free speech.” Senator Al Franken has referred to this as “the First Amendment issue of our time.”
Meanwhile, H.R. 3458, the Internet Freedom Preservation Act introduced by Reps. Markey and Eshoo in 2009 seems to have spent the last year stalled in committee.
Lawyer Mitchell Lazarus wrote in IEEE Spectrum last year that net neutrality is just wishful thinking. In 2002, the FCC said that Internet providers were not required to open facilities to other ISPs. Monopolies with vested interests moved to protect themselves. For instance, wired network provider Madison River Communications blocked access by Vonage. Comcast blocked content that might compete with its pay-per-view service. As a result, and more importantly, smaller entrepreneurs whose inventions might interfere with entrenched paradigms must worry that they will not only be drawn into long court battles, but more likely, frozen out of courtrooms because they can’t afford to litigate against wealthy adversaries. This makes raising capital even more difficult.
“Somebody is always regulating your channels. Will it be Comcast, Verizon, or do we have a rule through government that specifies minimal interference?” asked Google skeptic Siva Vaidhyanathan on a recent WNYC call-in show.
Some defensive actions ISPs might take could be illegal. Even more insidious are the actions they can take which are perfectly legal. They can demand expensive long term, or volume-based commitments to gain access to broadband service categories which are beyond the reach of startups and smaller firms. Perfectly legal. Expect that the cloud in cloud computing will need to be closer to the ground.
Lazarus bolsters his argument by reviewing the 12-year-old debate over Ultra-wideband. Here was an emerging technology supported by start-ups and radar companies, but opposed by just about every other existing corporate user of the spectrum. Not only did the fight drag on for more than a decade, despite what Lazarus argues was clear evidence that the new technology would not interfere with existing services, but the IEEE was unable to agree between two competing standards (search for “MB-OFDM” and “DS-UWB” for the sad history of competing standards) and disbanded its standards committee in 2006. The history of Ultra-wideband is an object lesson for anyone holding out hope for a straightforward role of professional associations in the Google-Verizon proposal.
The single biggest impact of the Google-Verizon proposal would be felt in one the hottest areas for investment in a weak economy. It would free the wireless Internet to engage in more price-based segmentation of services. Already in this space, and controlling not only the current revenue model but the existing infrastructure, big firms will stand to gain handsomely.
The Google-Verizon proposal tests our understanding of the distinction between large enterprises with government-sanctioned near-monopolies, and regulated public utilities accountable to a broader set of societal guidelines. Utilities can be privatized. Companies can act in the public interest and, if there is widespread adoption, economies of scale can result. But should the wealthy be allowed to buy passes for the HOV lane?
1. Media and Telecom company size and the cost of litigation may exert undue pressure on fairness and policy formation. Is there a guarantee that profit-driven investment attracted to wireless enhanced services will benefit small and medium sized entrepreneurs as much as it will benefit the monopolies?
2. While file sharing of video is singled out as a resource hog, in fact the data to support this is not public. Can you show us the data about BitTorrent, streaming TV, and the like?
3. Cisco would like to upgrade both the wired and the wireless internet to such an extent that there’s plenty of headroom for all to play. Wouldn’t this offer a provider-neutral environment, or is the wireless genie out of the bottle?
4. Does this proposal institutionalize the disparity between rural have-nots and their wealthier urban/suburban counterparts? Will it be even worse for wireless?
5. Figure A shows an account for 35mps down/20mps up with Verizon FIOS. The Verizon-recommended speedtest shows reduced upload speeds (though the FCC-sponsored tests of small file transfers say otherwise). Where is the transparency? Will the ordinary consumer with a modest at-home network be able to monitor an ISP’s service level? Will they need to?
6. Trust in telecommunications providers is not strengthened by AT&T’s collaboration with the NSA to wiretap and analyze domestic U.S. communications. Should we place additional trust in Big Telecom to segregate and price content for the wireless internet while handling more and more privacy data?
7. Is it still impossible to envision partnerships between small businesses and the current crop of broadband brokers?
I spent part of today writing a quick one-off data conversion program. The core of it was a function that filtered items from a list through a number of things in order to sort them into the right category. Once the dust settled on all of the sorting needed, the function had quite a lot of stock arguments, things that didn't vary from call to call in my program. In fact, an unwieldy number of them.
There are at least three vaguely Pythonic options for how to deal with this (plus how I actually did), but what interests me in retrospect is the one answer that I didn't even think about. Namely, global variables.
There are all sorts of reasons to avoid global variables in general, but this was a one-off program and if I'm being honest, that's what all of those stock parameters really were. I was making them local variables in the calling function and then passing them in to the classifying function not so much because it was a good idea but because that's what I do in Python. I just don't use global variables very much even when they'd arguably make sense, and when I do use them I feel irritated.
As best I can tell, what does it is the pesky global keyword. Having
to declare variables global any time I want to rebind them adds just
enough extra friction to using global variables in practice that I would
rather not bother and instead pass lots of things around as parameters.
I generally resort to global variables only when passing the same
information as parameters would add arguments to too many layers of
function calls.
(This is the situation where you have four or five layers of function calls and some of the stuff down at the leaves wants to gather some expensive piece of information only once. The nominally logical thing to do is to call the 'gather information' function once at the start of your program and then pass the parameter all the way down to the leaves, but that means you have to pass the information object through all of the intermediate layers, where all it does is clutter up parameter lists. Really, you want to put it in a global variable, especially if you have several different clusters of these functions that want different chunks of information; passing the information they need down as parameters doesn't scale.)
Part of the friction is the annoyance of the extra line in any function
that will rebind the global variable. But another part is just having to
think about it at all, partly because I sort of consider global to be
a wart (especially because I know what the bytecode is doing behind
the scenes).
(Global's not really a wart, but that's another entry.)
The three Python options that immediately come to mind are:
Since this was a quick hack, I was lazy and did the poor man's structure: I made a tuple with all of the stock parameters and just passed in the tuple (and then unpacked it in the classifying function). This is less aesthetically pleasing than a structure, but also less code, and it is the obvious next step when one's parameter list spirals out of control and most of it is the same from call to call.
(My eventual code had two arguments that varied from call to call and six that were the same, packed into a tuple. I'm sure that this is a code smell, but it was a quick hack.)
I know y'all can't live without another update so here it is.
The VerizonSupport twitter account sent me a secret URL to give them my account info and problem description. After filling it twice (separated by 2 days), I got no phone call, no email, no results.
Today I called and was told that the IVR system transfered my phone call to billing because I was entering my phone number (as asked) but since I don't have Verizion phone service it was confused. That is the phone number on my account, and it certainly is able to look up my account after I've entered it, but the person assured me that this was the problem. I should select the option where I enter my account number instead of my phone number and it should work.. promise.
When I got home (where I have the account number) I did as requested and of course the system said it is transferring my account to billing.
So what to do?
Well, I've tried billing and tech support with no luck. I decided to call sales. Stephanie and I had an ok conversation. She said that last month my account was "in treatment" and now it definitely isn't so there should be no outage on Wednesday. I pointed out that the IVR system disagrees, but she said I should "let it go". She also said that if I do have an outage on the first of the month, they can cancel the account and recreate it. I'd have a 20 minute outage. She wrote all of this up in my account notes.
Wednesday I'll be "oncall" for work from 4pm to midnight. If there is an outage in the morning, I'll be spending all day getting it fixed so that I can have connectivity for my oncall shift.
I've spent more than 10 hours on the phone with Verizon at this point.
Blog Update
I've just updated the home-grown javascript I was using upon this blog to be jQuery powered.
This post is a test.
I'll need to check but I believe I'm almost 100% jQuery-powered now.
AJAX Proxies
It is a well-known fact that AJAX requests are only allowed to be made to the server the javascript was loaded from. The so-called same-origin security restriction.
To pull content from other sites users are often encouraged to write a simple proxy:
- http://example.com/ serves Javascript & HTML.
- http://example.com/proxy/http://example.com allows arbitrary fetching.
Simples? No. Too many people write simple proxies which use PHP's curl function, or something similar, with little restriction on either the protocol or the destination of the requested resource.
Consider the following requests:
- http://example.com/proxy.php?url=/etc/passwd
- http://example.com/proxy.php?url=file:///etc/passwd
If you're using some form of Javascript/AJAX proxy make sure you test for this. (ObRandom: Searching google for inurl:"proxy.php?url=http:" shows this is a real problem. l33t.)
ObQuote: "You're asking me out? That's so cute! What's your name again? " - 10 things I hate about you.
Interesting technology-related news from around the web for 2010-08-30:
This is a guest post from Larry Dignan, Editor in Chief of ZDNet, TechRepublic’s sister site. You can follow Larry on his ZDNet blog Between the Lines (or subscribe to the RSS feed).
—————————————————————————————
It has been a rough few days for anyone interacting with the state of Virginia following an IT outage that affected 26 state agencies. Can a storage area networking failure really cripple a state’s IT systems?
Virginia’s IT infrastructure, which is managed by Northrop Grumman, has led to a few statements from agencies. Notably, Virginia’s Department of Motor Vehicles hasn’t been able to process requests for licenses and ID cards. These systems are supposed to be up and running on Tuesday, six days after the outages started to appear.
Meanwhile, the Virginia Information Technologies Agency (VITA) said in a statement that teams have been working throughout the weekend to restore data. In a nutshell, the IT infrastructure of the state of Virginia was reportedly crushed by an EMC storage area network failure. The Richmond Times-Dispatch reports that several systems are still down. The same paper said that Northrop Grumman will have to pay a fine for the failure. And the real kicker is that recently revised its contract with Northrop Grumman and extended the deal for three years. The state paid an additional $236 million for better service from Northrop Grumman.
Needless to say Virginia residents aren’t pleased. We’ve received a few emails and calls and the comments on the Richmond Times Dispatch site are summed up by this one:
Highlights of the Revised Contract
Operational Efficiencies
Consolidates and strengthens Performance Level Standards with a 15% increase in penalties across the board if Northrop Grumman fails to perform on clearly identified and measured performance standards. - PAY-UP
Improves Incident Response teams to determine technology failures and expedite repair - FAILED
Institutes clear performance measurements for Northrop Grumman that agencies can easily track - FAILED
Adds new services to contract such as improved disaster recovery and enhanced security features - FAILED
Among the key parts of the VITA statement:
- Successful repair to the storage system hardware is complete, and all but three or possibly four agencies out of the 26 agency systems have been restored. Agencies continue to perform verification testing.
- Progress continues, but work is not yet complete for the three or four agencies that have some of the largest and most complex databases. These databases make the restoration process extremely time consuming. The unfortunate result is the agencies will not be able to process some customer transactions until additional testing and validation are complete.
- According to the manufacturer of the storage system (EMC), the events that led to the outage appear to be unprecedented. The manufacturer reports that the system and its underlying technology have an exemplary history of reliability, industry-leading data availability of more than 99.999% and no similar failure in one billion hours of run time.
The official explanation for the outage leaves a bit to be desired and frankly doesn’t pass the sniff test. The outage was blamed on the failure of two circuit boards installed and maintained by EMC.
Simply put, it’s a big disconcerting that two circuit boards can bring down a state’s IT infrastructure for nearly a week. Talk about a lack of redundancy.
Among the things that don’t add up in the Virginia IT outage:
We’re told that Northrop Grumman knows about its IT management issues and is working on correcting the problems. Northrop Grumman was awarded a $2.3 billion IT services contract in 2005. And the company has touted some of the state’s successes. Meanwhile, Northrop Grumman even relocated to Virginia. Hopefully, that proximity will lead to better IT management.
|
USENIX and LOPSA have partnered to provide LOPSA members a $45
discount to the USENIX LISA 2010 conference. To take advantage of the discount, enter the code into the Discount code field on the
LISA'10 registration form.
|
Throughout LISA registration, LOPSA will be offering a promotional membership rate of $35/year for new members or renewals. When you account for the LISA discount, this means if you become a member now, and attend LISA, you'll be getting paid $10 to attend your favorite System Administration conference, which should help you sell it to those who have to approve purchases.
LISA'10 is being held in San Jose, CA during the week of November 7–12, 2010.
The LISA conference is sponsored by USENIX in cooperation with LOPSA and other organizations.
A group of researchers consider popularity over-rated and have come up with a novel approach to make password guessing much more difficult.
—————————————————————————————–
I have previously reported on the work of Dr Cormac Herley, a researcher for Microsoft. His papers on why users should reject security advice and why userIDs may be more important than passwords were groundbreaking to say the least.
Dr. Herley; along with Dr. Stuart Schechter, also of Microsoft Research and Dr. Michael Mitzenmacher from Harvard University have stepped outside the box once again. The introduction to their new paper, “Popularity is everything” (pdf) foretells why:
“We propose to strengthen user-selected passwords against statistical-guessing attacks by allowing users of Internet-scale systems to choose any password they want so long as it’s not already too popular with other users.”
That grabbed my attention. Apparently, the team has found a better way to combat statistical-guessing attacks, or dictionary attacks, to us IT types. It involves:
The last bullet is a hint at what they’re trying to do. The team wants to create what they call a popularity oracle. If I understand correctly, it’s a web application where we would test our password choices, with the oracle advising us on our choice’s popularity.
You have to admit, the concept has merit. Then, I realized how difficult it would be to get something like this off the ground. Also, what if the bad guys gained control of the popularity oracle? Things would be worse than they are now.
Has the research team thought about this? I felt obligated to ask and here is what they had to say:
TechRepublic: Dr. Herley, you and the other researchers came up with an interesting hypothesis. Could you share a bit about how the idea came about?
Dr. Herley: Sure. Our starting point is that password policies (must be a certain length, include upper, lower, special chars etc) can be very frustrating for users. But web sites feel they must use them to prevent users from gravitating toward obvious and common passwords.
So we stepped back and asked ourselves: What are we really trying to accomplish here? It’s not that we desperately need everyone to stick an ‘&’ or ‘}’ in their password. What we want is to make it hard for an attacker to guess passwords. And attackers guess passwords by trying the obvious and more common ones first. We want to make that attack work less well.
Our idea is that password rules are an indirect and inefficient way of achieving this goal. They make it hard for users, and it’s not clear if they make a guessing attack difficult.
Suppose you have a website with 100 million users, and the 10 most popular passwords are used collectively at 1% of the accounts. That’s a potential 1 million accounts an attacker can get by trying this short list. Now suppose, instead, that no password can be used at more than 100 accounts. In this case, any list of ten gives an attacker only 1000 possible accounts. That’s what our scheme is aiming for.
TechRepublic: Is your contention that even though a password is strong. Being popular, makes it weak and highly guessable?
Dr. Herley: Yes. If a password is used by, say, 0.1% of all hotmail users then I can break 0.1% of the accounts if I know the usernames. It doesn’t matter if the password is ‘snoopy’ or ‘6g$9_35sd.’ Being common is a real problem, whether the password satisfies any definition of being strong or not.
TechRepublic: You mention that existing password-strength meters are less than adequate due to inconsistencies in how they measure strength. Could you explain that in more detail?
Dr. Herley: To be fair, measuring strength is a really hard job. It’s easy to check the length and whether it contains special characters, but it’s not so simple to check if it’s based on a dictionary word. So many strength meters will classify ‘P@$$w0rd’ as strong and ‘hdgopw’ as weak, while you’re probably a lot safer with the latter than the former.
Dr. Schechter: Password-strength meters use heuristics - simple rules -to guide users to supposedly stronger passwords. For example, some will report that a password is strong if it contains uppercase characters, digits, and special characters. As with any heuristic, there are times when they will come to the wrong conclusion.
TechRepublic: You suggest using a count-min sketch as the data structure to track password popularity. Your reasoning it seems is that it would be more efficient than just listing the passwords in a database. I have been trying to understand how a count-min sketch works, unsuccessfully I might add. I would appreciate your insight as to what they are and how they would be better than just a simple database of passwords.
Dr. Herley: Our approach is to rule out passwords that become too popular, so we need some way of keeping track of popularity. Now, we could just store a popularity table of passwords, but that would be insecure. If it leaks, it gives an attacker a roadmap of what to try first.
Of course we try to prevent it from leaking, but we don’t want to assume that. For example, an attacker could then just try all of the popular passwords on every username he can find. So we want to be more sophisticated than that.
You can think of the count-min sketch as an oracle that sometimes lies, but only in one direction. It keeps track of which passwords are popular and answers the question ‘is this password popular?’ If the password really is popular, it always answers truthfully, i.e., yes. But, if a password is not popular it sometimes, at random, answers yes.
These are the false positives. These are valuable, because now an attacker who has access to the count-min sketch doesn’t have an easy way figure out which passwords are truly popular, and which ones are not. He can’t just use it to compile a list of which passwords he should try first.
Dr. Schechter: If an attacker compromised a simple database of passwords, he would immediately know which passwords are in use by the systems users and which passwords were in use by the most users.
Our proposed data structure does not store the passwords themselves and tracks popularity only so far as to identify passwords that are too popular. It does not allow the attacker to determine which of the (large number of) passwords that we consider too popular are the most popular.
TechRepublic: The paper mentions:
“Replacing password creation rules with popularity limits has the potential to increase both security and usability.”
Could you explain why you feel that way?
Dr. Herley: The way things are, attackers can be confident that a lot of people gravitate toward the most obvious passwords allowed by a site. So a dictionary or list of the most common choices is what they try. Our scheme makes things more secure by ensuring that no short list covers too many accounts.
Our hope is that it’s less inconvenient to be told occasionally that a password is too popular than to deal with complex password rules users now face. This appears hopeful since, as I’ve said, the complexity rules are a very indirect way of making guessing attacks hard, so they needlessly forbid many choices that are probably perfectly fine passwords.
TechRepublic: Could you give us some idea as to when a password becomes dangerously popular?
Dr. Schechter: One might consider a password to be popular if more than one in a million people use it. For large systems with over a hundred million users, a password would become dangerously popular when over a hundred other people were already using it.
TechRepublic: In the paper, you were acutely aware of the leverage attackers would have if they had access to the password oracle. How would you make sure that would never happen?
Dr. Schechter: The password popularity oracle could be guarded in the same way that the existing password database is guarded. That said, we have designed the oracle expressly to minimize the consequences of compromise.
In fact, it does such a good job of hiding information that it may make sense to release the oracle for use by others, even if that means attackers could obtain a copy of it.
TechRepublic: The paper suggests a novel way to use a CAPTCHA application. Could you explain how it works?
Dr. Herley: A successful defense against statistical-guessing attacks requires not only the avoidance of popular passwords, but also mechanisms to limit the number of guesses an attacker can issue. Limiting guesses (using a CAPTCHA, for example) is especially important if users aren’t forced to avoid popular passwords, or if users with accounts that predate the no-popular-passwords policy have not been forced to select less-popular passwords.
If users knew that when a CAPTCHA appears, it means the password is too popular. It would be added incentive to change the password to a less-popular one, removing the need to fill in a CAPTCHA at every login.
TechRepublic: The concept of using a popularity oracle is interesting. What would it take to have a workable system on a scale that a significant number of people could use?
Dr. Schechter: The data structure itself is rather straightforward. The challenges of integrating such a feature into a larger system are usually dominated more by the specifics of the larger system than the feature itself.
By definition, using a popularity oracle would reduce the bad guy’s odds of guessing correct passwords. Couple that process with using complex userIDs and bulk-guessing attacks become ineffective, a good thing.
I would be remiss if I didn’t take time to thank the researchers for helping explain the concept of a popularity oracle.
This is apparently the “time to schedule your conference trips” part of the year, because there is news on the SysAdmin conference front.

First, and most pressing, the LISA10 conference schedule has been released! I’ve got to say, I’m digging the theme of the website, too. More important, though, is the content. Interestingly, all sessions and tutorials are available in half-day increments this year. This means that you can attend the first half of one session then migrate to another session after lunch. I’ve got mixed feelings about this, but I’m interested in how it will pan out. More flexibility is nice, though, and sometimes the first half of a session is really review (though there are a lot of arguments against that, too).
As always, there are discounts available for certain groups, and you do get a lower admission price if you’re a member of LOPSA, USENIX, or SAGE.
Check out the registration page for the fees. There’s an early-bird special going on until October 18th, so make sure you register soon. The return on investment for this conference is amazing.
I’m going to be there as a conference blogger, along with Matthew Sacks, Ben Cotton, and Marius Ducea. We’ll be publishing entries on the USENIX blog (which I’ll be linking to from here as well, of course).
Come to LISA and have a great time. And if you do decide to come, find me and say hello. I always love meeting readers.
Shifting gears a little bit, I’m sure you remember the PICC conference that LOPSA-NJ hosted. Well, we had a blast, and last year’s conference chair, William Bilancio, did an amazing job. It’s a bit much to do that twice in a row, though, so he was looking for someone to take the responsibility for this year’s conference, and after running it through my head a while, I decided that I’d take the job if he thought I’d do alright. Here’s his email announcing it:
It is with a great sigh of relief that Matt Simmons has decided to be
the Program Chair for PICC ‘11.Last year Matt was the head of the marketing team and did a great job
at getting the word out about the conference and was a key person in
making last years conference a success.Tom and I feel that he will do a great job as the Program Chair and
will make PICC ‘11 a great conference.In other news I will be getting in contact with the hotel and get the
date locked in, in the next few weeks and then we can start really
working on the conference.Please start thinking about sponsor ideas as well as any new people
you think will be able to help make PICC ‘11 another great conference.Again thank you Matt for taking PICC ‘11 Program Chair job and good luck.
William
I want to thank William and everyone who was involved with last year’s conference. Everyone I’ve talked to had a great time and has been looking forward to this coming year. I’m going to work hard to try to improve on William’s example, and really grow the community of system administrators in New Jersey and the rest of the northeast. I’m going to need help, though, so if you helped out last year, I’ll be calling on you now. If you weren’t involved last year, now is a great time. Drop me an email or comment on this story to let me know that you’re interested in volunteering. We can definitely use the help.
In addition, I was talking to Lee Damon, who let me know about a SysAdmin conference called “Cascadia IT Conference” (aka “CasITConf”), and it’s happening in the Pacific Northwest. It’s being put on by SASAG, the Seattle-Area System Administrators’ Guild.
So there you go. Three sysadmin conferences in one post. It’s going to be a busy year for everyone, so get involved and lend a hand to someone in your area!

Pomegranate is a novel distributed file system built over distributed tabular storage that acts an awful lot like a NoSQL system. It's targeted at increasing the performance of tiny object access in order to support applications like online photo and micro-blog services, which require high concurrency, high throughput, and low latency. Their tests seem to indicate it works:
We have demonstrate that file system over tabular storage performs well for highly concurrent access. In our test cluster, we observed linearly increased more than 100,000 aggregate read and write requests served per second (RPS).
Rather than sitting atop the file system like almost every other K-V store, Pomegranate is baked into file system. The idea is that the file system API is common to every platform so it wouldn't require a separate API to use. Every application could use it out of the box.
The features of Pomegranate are:
Can Ma, who leads the research on Pomegranate, was kind enough to agree to a short interview.
One would think that VMworld is all about virtualization, but before the show is underway the clear talk of the town is the upcoming 3PAR merger. In this blog post, IT pro Rick Vanover highlights some thoughts on the topic.
—————————————————————————————
Coming into VMworld 2010 in San Francisco, one would think that the big debate amongst attendees would revolve around cloud technologies. Another topic for discussion could be VMware’s recent vSphere 4.1 release. During the VMunderground Warm-Up-Party-As-A-Service reception, or WuPaaS, the big discussions I had revolved around the ongoing 3PAR acquisition drama. HP and Dell have been in a bidding war for 3PAR, and this is quite the discussion point for VMworld attendees before the show has even officially started.
In my conversations with various attendees at the WuPaaS reception, there were a number of opinions that were presented in favor of both parties. Should HP end up victorious in the 3PAR acquisition, this could allow HP to provide a high-end storage solution with its own design and engineering staff resources. HP, like Dell, currently engages in OEM relationships with its higher-end storage products. HP has other OEM storage relationships, including those with Dot Hill Systems for the modular storage array systems. I like HP to win the 3PAR battle to give HP a more competitive offering in the higher-end solutions. The question is, what will the execution look like? A number of other storage and infrastructure administrators may have questions about this very point based on HP’s IBRIX acquisition.
Should Dell end up the winning suitor, many of the WuPaaS attendees would see that as the better arrangement. For one, I’ll concede that Dell manages a slightly better channel than the various number of HP channel arrangements. In my opinion, that is important but not the only factor in play. Many people cite Dell’s recent acquisitions of Equallogic, Exanet and Ocarina as part of a broad-reaching strategy to deliver a robust storage solution. If 3PAR is added into the mix, the solution is rather compelling up and down the stack. Consider also the potential for each of the solutions bring to a single sales channel, Dell is on to something here.
It is clear that both HP and Dell can’t just plan on their servers making the storage sales. The competition is fierce, and these organizations are willing to shell out major dough to make a move. Where do you want 3PAR to end up? Share your comments below.
What do you do when you need to install a port with a reported vulnerability on FreeBSD?
One of the great things about FreeBSD is its security tools, and the fact that some of these tools are designed to keep the user informed. The article, “How FreeBSD makes vulnerability auditing easy: portaudit,” explains one of these tools, and how it can help the user maintain a secure system.
Normally, when you encounter a vulnerability reported by the portaudit tool, there is a patch for the reported vulnerability that can be installed to solve the problem. There are occasionally instances where one has to wait a little while for a fix to appear, however.
One’s first reaction to this might be to question why vulnerabilities are being left unaddressed when they are reported, but the fact of the matter is that, in some cases, fixing the relevant vulnerability is beyond the control of the FreeBSD core developers. The software in the ports system is primarily a collection of software outside of the FreeBSD base system that can be installed if the user desires it, developed and maintained by independent development projects and made available in the ports system by the efforts of people providing a convenience to FreeBSD users.
Far from being a failure on the part of the FreeBSD project, a port with a reported vulnerability and no available fix is in fact a result of the FreeBSD project’s contributors doing what they can to keep the user informed. Of course, a question arises: How should we handle security notifications? Is it better to keep quiet about a vulnerability until it is fixed in the hopes that knowledge of the vulnerability will be kept out of the hands of malicious security crackers, or to tell the people affected by the vulnerability so they can take any necessary measures to protect themselves?
In the case of the FreeBSD ports system, it may be a moot point. The vulnerabilities reported by the portaudit tool tend to be vulnerabilities that are already publicly known, even if some of them may not be widely known. It is a matter of degrees, however; any vulnerability, once it is discovered by a well-meaning security researcher, may already be known by malicious security hackers in any case. With this in mind, and given the situation of a vulnerability in an installed port, the reports offered by the portaudit tool do offer advice on how to deal with them:
Affected package: linux-f10-pango-1.22.3_1
Type of problem: pango — integer overflow.
Reference: http://portaudit.FreeBSD.org/4b172278-3f46-11de-becb-001cc0377035.html1 problem(s) in your installed packages found.
You are advised to update or deinstall the affected package(s) immediately.
The advice in that last line, when it is at all reasonable to do so, is good advice to follow.
Unfortunately, there may be times when it is not such a reasonable choice. The Linux version of the pango library is a particular problem in this area; it seems to develop a new vulnerability every six months, and that vulnerability seems to get ignored all too often by the upstream maintainers at the Fedora project for far too long. Sometimes, it gets ignored indefinitely, when the Fedora project has moved on to a new OS release version and ceased supporting that particular version of the pango library.
In such cases, after reading the news at the reference URL provided for the vulnerability, a user may decide that keeping the library is necessary to ongoing operations and that the vulnerability in question is not a critical one under current usage. There may even be new versions of the library that are desired, but that still do not fix the vulnerability.
Of course, with portaudit on the job, the system will not normally allow the sysadmin to install vulnerable software. This includes updating software from one vulnerable version to another, too. What is a sysadmin to do?
Luckily, there is a solution to the problem — a way to tell the system to ignore the vulnerability warning and, darnit, do what you want it to do anyway. One should think very carefully before taking this approach, but the option is there if one needs it.
If you use the basic make tools for installing a port, the syntax for disabling the vulnerability check long enough to install a vulnerable version of a port is reasonably simple:
# cd /usr/ports/x11-toolkits/linux-f10-pango
# make -DDISABLE_VULNERABILITIES
# make install clean
Many prefer to use the portupgrade tool, available in ports, to handle installing and updating software. The same make option can be passed to the portupgrade utility, using the -m command line option for either the portupgrade command (which can be used to update an installed port or, with the -N option, to install a new port) or the portinstall command (which is just syntactic sugar that means the same thing as portupgrade -N):
# portinstall -m DISABLE_VULNERABILITIES=yes linux-f10-pango
Remember, of course, that this should be an act only of the last resort. The preferred option, when faced with a vulnerability in a piece of software, should be one of the following:
In some cases, it will even be preferable to simply uninstall the software and do without it altogether. The Linux version of the pango library is unfortunately necessary for a stable Flash plugin for the Firefox browser on FreeBSD, but you should ask yourself: Just how necessary is Flash right now? For many of us, it is not nearly as important as our first, knee-jerk reaction might suggest.
Before you point out that you do not have this problem on MS Windows or Apple MacOS X, you should know that you almost certainly do have an unfixed vulnerability problem with Flash on those systems, and probably with several other pieces of software as well. Over the years, Flash has (along with its dependencies) proven to be a common source of vulnerability issues. On systems without tools like portaudit, however, you have an additional problem: you often do not know that you are vulnerable.
Here is an irritation that gets me every so often: the Bourne shell has
no wildcard match operator that you can use in if checks and the like.
You can do wildcard matches, but only in case statements.
(Bash has the [[ =~ ]] operator, but it uses regular expressions
instead of shell wildcards. I know, I pick nits, but shell wildcards are
often simpler and they match what you use in other sh contexts.)
This comes up surprisingly often, at least in the sort of shell scripts
that I write. It's not insurmountable but it is inconvenient and it
can make my shell scripts read less clearly. Later shells, such as
Plan 9's rc, get this right and have built in wildcard matching and
non-matching operators, and I have wound up using them relatively
frequently.
(Yes, there is a workaround if you are doing this often enough.)
Of course, like a lot of things about the Bourne shell there are
historical and philosophical reasons for this. The biggest one is
a programming language design issue: you really want your wildcard
matching operator to have shell support so that you do not have to keep
quoting the wildcards themselves. Philosophically, the only good place
to put this in the Bourne shell is as part of explicit shell syntax (ie,
in a case statement); inventing a magic operator that didn't do shell
wildcard expansion when used as if it was a command would be at least
inconsistent.
(Tom Duff was willing to be this magical when creating rc,
fortunately. It may be inconsistent but it's very convenient.)
The difficulty is compounded because the natural place to put such an
operator is in test, and test started out as an external program,
not something built in to the shell. If not expanding wildcards in
something that looks like a command is odd in the Bourne shell, doing so
for some arguments to an external program is outright serious magic.
PS: expr is not an adequate substitute for various reasons.
case conditions will do variable expansion and then, if the variable
expands to a wildcard, do wildcard matching on the result. So the simple
way around this is to define a function:
match() {
case "$1" in
$2) return 0;;
esac
return 1
}
Then you can say 'if match $var ".c"' and the like. If you have to
you can even write vaguely crazy things like 'if match $var "*.c" &&
[ -f $var ];'.
I don't like repeating myself, but I'm very tempted to past my mini-review of the Roomba Vacuum Cleaner robot into this blog.
Instead I will practise restraint and summerise:
£250. Worth. Every. Penny.
In more Debian-friendly news I've been fighting HTTP proxies today. I've noticed a lot of visitors to the various websites I host are logged as 127.0.0.1 - which is an irritation. My personal machine looks like this:
Internet -> Apache listening on *:80 -> thttpd on 127.0.0.1:xxxx
(This has been documented previously - primarily it is a security restriction. It means I can run per-UID web-servers.)
I had previous added a patch to thttpd to honour the X-Forwarded-For: header - so that it would receive the correct remote address passed on from Apache. However the fact that so many visitors are logged as coming from 127.0.0.1 meant it wasn't working 100% correctly, and I wanted to understand why.
Today I used ngrep to capture the incoming headers and the source of the problem became apparent:
skx:~# ngrep -d lo X-For ' port 1007' .. T 127.0.0.1:41886 -> 127.0.0.1:1007 [AP] GET /about/ HTTP/1.1..Host: images.steve.org.uk..If-Modified-Since: Mon, 07 Jun 2010 15:24:33 GMT..User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-U S; rv:1.9.1.10) Gecko/20100701 Iceweasel/3.5.10 (like Firefox/3.5.10)..Acce pt: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8..Accept -Language: en-us,en;q=0.5..Accept-Encoding: gzip,deflate..Accept-Charset: I SO-8859-1,utf-8;q=0.7,*;q=0.7..Referer: http://images.steve.org.uk/2009/11/ 20/img_0471.html..X-Forwarded-For: 127.0.0.1, 11.22.33.123..Cache-Control: max-age=0..X-Forwarded-Host: images.steve.org.uk..X-Forwarded-Server: image s.steve.org.uk..Connection: Keep-Alive....
I bolded the important input; just in case that didn't jump out it was:
X-Forwarded-For: 127.0.0.1, 11.22.33.123
My patch to thttpd was making it read the first address, rather than the second - which meant that requests were being logged as coming from 127.0.0.1 and avoiding my efforts to track sources.
Now I understand the problem - The X-Forwarded-Host header is being tweaked by a proxy server, such as Squid, upstream of my server.
For the moment I've updated the thttpd patch to read:
else if ( strncasecmp( buf, "X-Forwarded-For:", 16 ) == 0 )
{ char *tmp = NULL;
/* Jump to the header-value */
cp = &buf[16];
cp += strspn( cp, " \t" );
/*
* If the first change is a 127.0.0.1, then we'll
* jump over it. Cope with Squid, et al.
*/
if ( ( tmp = strstr( cp, "127.0.0.1, " ) ) != NULL )
cp = tmp + strlen( "127.0.0.1, " );
/* Parse the IP */
inet_aton( cp, &(hc->client_addr.sa_in.sin_addr) );
}
That's not perfect, but the alternative would be:
Or something equally hacky and security-by-obscurity-alike.
Really I just want a simple way of always getting the correct remote IP. Shouldn't be so hard, should it? *pout*.
ObQuote: "You don't mess with fate, Peanut. People die when they are meant to die. There's no discussion. There's no negotiation. When life's done, it's done." - Dead Like Me.
This morning John Troyer coordinated a bunch of bloggers for a session over at the VMworld 2010 Hands-On Lab facilities in Moscone West. Adam Zimman, Dan Anderson, and Curtis Pope took turns explaining and demoing the lab to us. The lab itself was built as a cloud-oriented system, using software-on-demand and service-on-demand principles, and relying heavily on remotely-hosted equipment in data centers in Miami, FL (Terremark) and Ashburn, VA (Verizon).
The Lab team is really building on what they’ve learned from other years. There are many more labs this year than last, and they’re all self-paced, though there are options for instructor interaction as well if you have questions or want more one-on-one guidance. Self-paced labs means they can do almost unlimited content, and it’s easier to get lots of people through the labs. Last year they had, all totalled, about 7000 lab seat hours. This year they have almost 20,000, with 480 View stations in eight rooms. Dan Anderson, the lab’s lead architect, had some proud things to say about what they’ve done. “The content is killer, the best content I’ve seen yet. If someone sits for four days, eight hours a day, they might be able to get through all of them. But nobody can complain about not having enough stick time,” said Dan.
Perhaps he’s never met some of the curmudgeonly people that attend VMworld. :) But I really appreciate the iterative approach they’ve taken this year to making the labs better. For instance, they learned that pre-registration for the labs didn’t work very well in other years, so it’s all first-come, first-served (FIFO). There’s a check-in station that works with your badge number, and a waiting room with couches and whiteboards and Subject Matter Experts while you wait. The labs will be open from 8 AM until 10 PM every day, too, and they will be offering a prize to “dedicated individuals” (they thought speed and quantity might be the factors, but it isn’t set in stone). They did say the prizes would be something like a pass to VMworld 2011, though, which is very cool.
The hardware and software powering the lab is pretty amazing, with a number of sponsors contributing staff, equipment, and software to make it run. Sometimes on very short notice, too. And in some cases this lab is the largest deployment yet of these technologies. They’re pre-populating lab environments with instances of each lab setup, to avoid the on-demand 5 to 7 minute wait from last year, which is great. They’re worried that they’ll have the prepopulation levels off a little on the first day, but even if you do get caught waiting you can still read the manuals. They estimate that the labs are using roughly 36 TB of RAM (yes, TB) and there’s about 200 TB of storage, between EMC and NetApp, in each data center powering the labs, all connected via NFS. The storage itself is everything from enterprise flash (EFD) to SATA, with the EFD often being used as FastCache to front-end the slower storage.
The stations themselves are Wyse thin clients, with dual monitors and even dual chairs, even though it’s geared for one-on-one learning. It’s all about flexibility and options, which extends to the content itself — the vSphere Sandbox lab is just a deployment of all of their products, for freeform messing around. The Lab team even has redundant wiring to the lab stations, just in case they need it (“We even have redundant chairs!” said Adam). They’re flexible, they’re ready, and they’re hoping that they can set records for the number of happy people in the labs this year. And if you’re not happy, there’s 150 staff floating around to help you out, as well as two lab captains per room.
I’m looking forward to it — labs have always been a highlight of VMworld for me, and these guys are making it even better. I know it’s a lot of work to build, in two months, what usually would be done in a year or two (and then, as Dan said, “throw it on a truck.”). On behalf of all of us, thank you Adam, Dan, and Curtis (and all the others that we didn’t meet). I hope all your hard work is a giant success!
This post written by Bob Plankers for The Lone Sysadmin. Unless otherwise noted it is © 2010 Bob Plankers and licensed under the Creative Commons BY-NC-SA 3.0 license.