Planet SysAdmin

January 17, 2017

Despite revoked CA’s, StartCom and WoSign continue to sell certificates

The post Despite revoked CA’s, StartCom and WoSign continue to sell certificates appeared first on

As it stands, the HTTPs "encrypted web" is built on trust. We use browsers that trust that Certificate Authorities secure their infrastructure and deliver TLS certificates (1) after validating and verifying the request correctly.

It's all about trust. Browsers trust those CA root certificates and in turn, they accept the certificates that the CA issues.

(1) Let's all agree to never call it SSL certificates ever again.

Revoking trust

Once in a while, Certificate Authorities misbehave. They might have bugs in their validation procedures that have lead to TLS certificates being issued where the requester had no access to. It's happened for, Gmail, ... you can probably guess the likely targets.

When that happens, an investigation is performed -- in the open -- to ensure the CA has taken adequate measures to prevent it from happening again. But sometimes, those CA's don't cooperate. As is the case with StartCom (StartSSL) and WoSign, which in the next Chrome update will start to show as invalid certificates.

Google has determined that two CAs, WoSign and StartCom, have not maintained the high standards expected of CAs and will no longer be trusted by Google Chrome, in accordance with our Root Certificate Policy.

This view is similar to the recent announcements by the root certificate programs of both Apple and Mozilla.

Distrusting WoSign and StartCom Certificates

So Apple (Safari), Mozilla (Firefox) and Google (Chrome) are about to stop trusting the StartCom & WoSign TLS certificates.

From that point forward, those sites will look like this.

With Mozilla, Chrome & Safari, that's 80% of the browser market share blocking those Certificate Authorities.

Staged removal of CA trust

Chrome is handling the update sensibly, it'll start distrusting the most recent certificates first, and gradually block the entire CA.

Beginning with Chrome 56, certificates issued by WoSign and StartCom after October 21, 2016 00:00:00 UTC will not be trusted. [..]

In subsequent Chrome releases, these exceptions will be reduced and ultimately removed, culminating in the full distrust of these CAs.

Distrusting WoSign and StartCom Certificates

If you purchased a TLS certificate from either of those 2 CAs in the last 2 months, it won't work in Chrome, Firefox or Safari.

Customer Transparency

Those 3 browsers have essentially just bankrupted those 2 CA's. Surely, if your certificates are not going to be accepted by 80% of the browsers, you're out of business -- right?

Those companies don't see it that way, apparently, as they still sell new certificates online.

This is pure fraud: they're willingly selling certificates that are known to stop working in all major browsers.

Things like that piss me of, because only a handful of IT experts know that those Certificate Authorities are essentially worthless. But they're still willing to accept money from unsuspecting individuals wishing to secure their sites.

I guess they proved once again why they should be distrusted in the first place.

Guilt by Association

Part of the irony is that StartCom, which runs StartSSL, didn't actually do anything wrong. But a few years ago, they were bought by WoSign. In that process, StartCom replaced its own process and staff with those of WoSign, essentially copying the bad practices that WoSign had.

If StartCom hadn't been bought by WoSign, they'd still be in business.

I'm looking forward to the days when we have an easy-to-use, secure, decentralized alternative to Certificate Authorities.

The post Despite revoked CA’s, StartCom and WoSign continue to sell certificates appeared first on

by Mattias Geniar at January 17, 2017 08:30 AM

Simon Lyall 2017 – Tuesday – Session 3

The Internet of Scary Things – tips to deploy and manage IoT safely Christopher Biggs

  • What you need to know about the Toaster Apocalypse
  • Late 2016 brought to prominence when major sites hit by DDOS from compromised devices
  • Risks present of grabbing images
    • Targeted intrusion
    • Indiscriminate harvesting of images
    • Drive-by pervs
    • State actors
  • Unorthorized control
    • Hit traffic lights, doorbells
  • Takeover of entire devices
    • Used for DDOS
    • Demanding payment for the owner to get control of them back.
  • “The firewall doesn’t divide the scary Internet from the safe LAN, the monsters are in the room”


  • Poor Security
    • Mostly just lazyness and bad practices
    • Hard for end-users to configure (especially non-techies)
    • Similar to how servers and Internet software, PCs were 20 years ago
  • Low Interop
    • Everyone uses own cloud services
    • Only just started getting common protocols and stds
  • Limited Maint
    • No support, no updates, no patches
  • Security is Hard
  • Laziness
    • Threat service is too large
    • Telnet is too easy for devs
    • Most things don’t need full Linux installs
  • No incentives
    • Owner might not even notice if compromised
    • No incentive for vendors to make them better


  • Examples
    • Cameras with telenet open, default passwords (that can not be changed)
    • exe to access
    • Send UDP to enable a telnet port
    • Bad Mobile apps


  • Selecting a device
    • Accept you will get bad ones, will have to return
    • Scan your own network, you might not know something is even wifi enabled
    • Port scan devices
    • Stick with the “Big 3” ramework ( Apple, Google, Amazon )
    • Make sure it supports open protocols (indicates serious vendor)
    • Check if open source firmward or clients exists
    • Check for reviews (especially nagative) or teardowns


  • Defensive arch
    • Put on it’s own network
    • Turn off or block uPNP opening firewall holes
    • Plan for breaches
      • Firewall rules, rate limited, recheck now and then
    • BYO cloud (dont use the vendor cloud)
      • HomeBridge
      • Node-RED (Alexa)
      • Zoneminder, Motion for cameras
  • Advice for devs
    • Apple HomeKit (or at least support for Homebridge for less commercial)
    • Amazon Alexa and AWS IoT
      • Protocols open but look nice
    • UCF uPnP and SNP profiles
      • Device discovery and self discovery
      • Ref implimentations availabel
    • NoApp setup as an alternative
      • Have an API
    • Support MQTT
    • Long Term support
      • Put copy of docs in device
      • Decide up from what and how long you will support and be up front
    • Limit what you put on the device
      • Don’t just ship a Unix PC
      • Take out debug stuff when you ship


  • Trends
    • Standards
      • BITAG
      • Open Connectivity founddation
      • Regulation?
    • Google Internet of things
    • Apple HomeHit
    • Amazon Alexa
      • Worry about privacy
    • Open Connectivity Foundation – IoTivity
      • Open source etc
      • Linux and Docket based
    • Consumer IDS – FingBox
  • Missing
    • Network access policy framework shipped
    • Initial network authentication
    • Vulnerbility alerting
    • Patch distribution

Rage Against the Ghost in the Machine – Lilly Ryan

  • What is a Ghost?
    • The split between the mind and the body (dualism)
    • The thing that makes you you, seperate to the meat of your body
  • Privacy
    • Privacy for information not physcial
    • The mind has been a private place
    • eg “you might have thought about robbing a bank”
    • The thoughts we express are what what is public.
    • Always been private since we never had technology to get in there
    • Companies and governments can look into your mind via things like your google queries
    • We can emulate the inner person not just the outer expression
  • How to Summon a Ghost
    • Digital re-creation of a person by a bot or another machine
    • Take information that post online
    • Likes on facebook, length of time between clicks
  • Ecto-meta-data
    • Take meta data and create something like you that interacts
  • The Smartphone
    • Collects meta-data that doesn’t get posted publicly
    • deleted documents
    • editing of stuff
    • search history
    • patten of jumping between apps
  • The Public meta-data that you don’t explicitly publish
    • Future could emulate you sum of oyu public bahavour
  • What do we do with a ghost?
    • Create chatbots or online profiles that emulate a person
    • Talk to a Ghost of yourself
    • Put a Ghost to work. They 3rd party owns the data
    • Customer service bot, PA
    • Chris Helmsworth could be your PA
    • Money will go to facebook or Google
  • Less legal stuff
    • Information can leak from big companies
  • How to Banish a Ghost
    • Option to donating to the future
    • currently no regulation or code of conduct
    • Restrict data you send out
      • Don’t use the Internet
      • Be anonymous
      • Hard to do when cookies match you across many sites
        • You can install cookie blocker
    • Which networks you connect to
      • eg list of Wifi networks match you with places and people
      • Mobile network streams location data
      • location data reveals not just where you go but what stores, houses or people you are near
      • Turn off wifi, bluetooth or data when you are not using. Use VPNs
    • Law
      • Lobby and push politicians
      • Push back on comapnies
    • For technologiest
      • Collect the minimum, not the maximum

FreeIPA project update (turbo talk) – Fraser Tweedale

  • Central Identity manager
  • Ldap + Kerberos, CA, DNS, admin tools, client. Hooks into AD
  • NAnage via web or client
  • Client SSSD. Used by various distros
  • What is in the next release
    • Sub-CAs
    • Can require 2FA for important serices
    • KDC Proxy
    • Network bound encryption. ie Needs to talk to local server to unencrypt a disk
    • User Session recording


Minimum viable magic

Politely socially engineering IRL using sneaky magician techniques – Alexander Hogue

  • Puttign things up your sleeve is actually hard
  • Minimum viable magic
  • Miss-direct the eyes
  • Eyes only move in a straight line
  • Exploit pattern recognition
  • Exploit the spot light
  • Your attention is a resource


by simon at January 17, 2017 06:25 AM

Chris Siebenmann

Making my machine stay responsive when writing to USB drives

Yesterday I talked about how writing things to USB drives made my machine not very responsive, and in a comment Nolan pointed me to LWN's The pernicious USB-stick stall problem. According to LWN's article, the core problem is an excess accumulation of dirty write buffers, and they give some VM system sysctls that you can use to control this.

I was dubious that this was my problem, for two reasons. First, I have a 16 GB machine and I barely use all that memory, so I thought that allowing a process to grab a bit over 3 GB of them for dirty buffers wouldn't make much of a difference. Second, I had actually been running sync frequently (in a shell loop) during the entire process, because I have sometimes had it make a difference in these situations; I figured frequent syncs should limit the amount of dirty buffers accumulating in general. But I figured it couldn't hurt to try, so I used the dirty_background_bytes and dirty_bytes settings to limit this to 256 MB and 512 MB respectively and tested things again.

It turns out that I was wrong. With these sysctls turned down, my machine stayed quite responsive for once, despite me doing various things to the USB flash drive (including things that had had a terrible effect just yesterday). I don't entirely understand why, though, which makes me feel as if I'm doing fragile magic instead of system tuning. I also don't know if setting these down is going to have a performance impact on other things that I do with my machine; intuitively I'd generally expect not, but clearly my intuition is suspect here.

(Per this Bob Plankers article, you can monitor the live state of your system with egrep 'dirty|writeback' /proc/vmstat. This will tell you the number of currently dirty pages and the thresholds (in pages, not bytes). I believe that nr_writeback is the number of pages actively being flushed out at the moment, so you can also monitor that.)

PS: In a system with drives (and filesystems) of vastly different speeds, a global dirty limit or ratio is a crude tool. But it's the best we seem to have on Linux today, as far as I know.

(In theory, modern cgroups support the ability to have per-cgroup dirty_bytes settings, which would let you add extra limits to processes that you knew were going to do IO to slow devices. In practice this is only supported on a few filesystems and isn't exposed (as far as I know) through systemd's cgroups mechanisms.)

by cks at January 17, 2017 05:37 AM

Simon Lyall 2017 – Tuesday – Session 2

Stephen King’s practical advice for tech writers – Rikki Endsley

  • Example What and Whys
    • Blog post, press release, talk to managers, tell devs the process
    • 3 types of readers: Lay, Managerial, Experts
  • Resources:
    • Press: The care and Feeding of the Press – Esther Schindler
    • Documentation: RTFM? How to write a manual worth reading


  • “On Writing: A memoir of the craft” by Stephen King
  • Good writing requires reading
    • You need to read what others in your area or topic or competition are writing
  • Be clear on Expectations
    • See examples
    • Howto Articles by others
    • Writing an Excellent Post-Event Wrap Up report by Leslie Hawthorn
  • Writing for the Expert Audience
    • New Process for acceptance of new modules in Extras – Greg DeKoenigserg (Ansible)
    • vs Ansible Extras Modules + You – Robyn Bergeon
      • Defines audience in the intro


  • Invite the reader in
  • Opening Line should Invite the reader to begin the story
  • Put in an explitit outline at the start


  • Tell a story
  • That is the object of the exercise
  • Don’t do other stuff


  • Leave out the boring parts
  • Just provides links to the details
  • But sometimes if people not experts you need to provide more detail


  • Sample outline
    • Intro (invite reader in)
    • Brief background
    • Share the news (explain solution)
    • Conclude (include important dates)


  • Sample Outline: Technical articles
  • Include a “get technical” section after the news.
  • Too much stuff to copy all down, see slides


  • To edit is divine
  • Come back and look at it afterwards
  • Get somebody who will be honest to do this


  • Write for


  • Q: How do you deal with skimmers?   A: Structure, headers
  • Q: Pet Peeves?  A: Strong intro, People using “very” or “some” , Leaving out import stuff




by simon at January 17, 2017 04:06 AM

Sarah Allen

remaining awake through a revolution

This day we honor Martin Luther King, Jr. who eloquently described the change that was, and is still, happening in our society. He often referred to the dramatic changes in technology, alongside other changes which require actions from each of us to make happen.

This morning I listened to “Remaining Awake Through a Great Revolution,” a speech delivered by Dr. Martin Luther King, Jr. on March 31 1968. He spoke of a triple revolution with advances in technology, weapons, and human rights. He talks about how we as individuals must accept responsibility and create change, not just in our own behavior, but changing the institutions we are part of.

one of the great liabilities of life is that all too many people find themselves living amid a great period of social change, and yet they fail to develop the new attitudes, the new mental responses, that the new situation demands.

His introduction refers to the story of Rip Van Winkle. We all remember how he slept for 20 years, but I had forgotten exactly what he slept through. He went to sleep under the reign of King George and woke up when George Washington was President — he slept through the American revolution. This story is a apt metaphor for today’s political and social climate. If we don’t talk together about what is happening in our world and work together to make change, we are sleeping. In King’s words: “anyone who feels that he can live alone is sleeping through a revolution.”

Here are some highlights of this speech that are still true today, and inspire me to work towards kind of world where I want to live, that I believe is still possible:

Through our scientific and technological genius, we have made of this world a neighborhood and yet we have not had the ethical commitment to make of it a brotherhood. But somehow, and in some way, we have got to do this. We must all learn to live together as brothers or we will all perish together as fools. We are tied together in the single garment of destiny, caught in an inescapable network of mutuality. And whatever affects one directly affects all indirectly. For some strange reason I can never be what I ought to be until you are what you ought to be. And you can never be what you ought to be until I am what I ought to be. This is the way God’s universe is made; this is the way it is structured…

We need all of the talents and potential of the people in the world to solve the challenges that face us. Let’s look out for the individuals in our daily lives who aren’t getting the opportunities to rise to their potential.

It is an unhappy truth that racism is a way of life for the vast majority of white Americans, spoken and unspoken, acknowledged and denied, subtle and sometimes not so subtle—the disease of racism permeates and poisons a whole body politic. And I can see nothing more urgent than for America to work passionately and unrelentingly—to get rid of the disease of racism.

The opportunity to speak out against racism rises up without warning. I have found myself often unprepared to speak in the moment, and so have worked on practices which cause me to be mindful and take small, quiet actions in my daily life. I volunteer for Bridge Foundry, learning how to work with diverse teams, teaching what I’ve learned to make our tech culture more inclusive and welcoming to people who have traditionally been excluded. I’ve learned about history, so I can tell lesser-known stories, and try to pay attention to present-day voices that deserve to be amplified. Often when I’m about to share an article, I take a little extra time to look up the person who wrote it. I think about how this person’s experience and culture intersect with mine. I do a little more digging and read a few more articles and sometimes choose to share a different one. I enjoy finding new voices. I seek to be intentional about the people who influence me.

we have difficult days ahead in the struggle for justice and peace, but I will not yield to a politic of despair. I’m going to maintain hope… This time we will really confront a Goliath. God grant that we will be that David of truth set out against the Goliath of injustice, the Goliath of neglect, the Goliath of refusing to deal with the problems, and go on with the determination to make America the truly great America that it is called to be.

The world is changing, always. We need to work together, and I’m not just referring to a mass movement to curb injustice and stand up for what’s right (though I hope to be part of that). I believe we need to look for ways to work together as individuals, to speak up in the moment, to address the small injustices that we witness (and participate in) every day.

I don’t intend to appropriate the words of Dr. Martin Luther King. This speech was as much about peace, as it was about racial injustice. It is my hope that with this small blog post I might highlight how his teachings are still very applicable today. I hope someone will be inspired to read or listen to the whole original speech, and that everyone will be inspired to and feel obliged to create positive change in the world.

With this faith we will be able to hew out of the mountain of despair the stone of hope. With this faith we will be able to transform the jangling discords of our nation into a beautiful symphony of brotherhood.

The post remaining awake through a revolution appeared first on the evolving ultrasaurus.

by sarah at January 17, 2017 01:23 AM

Simon Lyall 2017 – Tuesday Session 1

Fishbowl discussion – GPL compliance Karen M. Sandler

  • Fishbowl format
    • 5 seats at front of the room, 4 must be occupied
    • If person has something to say they come up and sit in spare chair, then one existing person must sit down.
  • Topics
    • Conflicts of Law
    • Mixing licences
    • Implied warrenty
    • Corporate Procedures and application
    • Get knowledge of free licences into the law school curriculum
  • “Being the Open Source guy at Oracle has always been fun”
  • “Our large company has spent 2000 hours with a young company trying to fix things up because their license is not GPL compliant”
  • BlackDuck is a commercial company will review your company’s code looking for GPL violations. Some others too
    • “Not a perfect magical tool by any sketch”
    • Fossology is alternative open tool
    • Whole business model around license compliance, mixed in with security
    • Some of these companies are Kinda Ambulance chasers
    • “Don’t let those companies tell you how to tun your business”
    • “Compliance industry complex” , “Compliance racket”
  • At my employer with have a tool that just greps for a “GPL” license in code, better than nothing.
  • Lots of fear in this area over Open-source compliance lawsuits
    • Disagreements in community if this should be a good idea
    • More, Less, None?
    • “As a Lawyer I think there should definitely be more lawsuits”
    • “A lot of large organisations will ignore anything less than [a lawsuit] “
    • “Even today I deal with organisations who reference the SCO period and fear widespread lawsuits”
  • Have Lawsuits chilled adoption?
    • Yes
    • Chilled adoption of free software vs GPL software
    • “Android has a policy of no GPL in userspace” , “they would replace the kernel if they could”
    • “Busybox lawsuits were used as a club to get specs so the kernel devs could create drivers” , this is not really applicable outside the kernel
    • “My goal in doing enforcement was to ensure somebody with a busybox device could compile it”
    • “Lawyers hate any license that prevents them getting future work”
    • “The amount of GPL violations skyrocketed with embedded devices shipping with Linux and GPL software”
  • People are working on a freer (eg “Not GPL”) embeded stack to replace Android userspace: Toybox, Toolbox, No kernel replacement yet.
  • Employees and Compliance
    • Large company helping out with charities systems unable to put AGPL software from that company on their laptops
    • “Contributing software upstream makes you look good and makes your company look good” , Encourages others and you can use their contributions
    • Work you do on your volunteer days at company do not fill under software assignment policy etc, but they still can’t install random stuff on their machines.
  • Website’s often are not GPL compliance, heavy restrictions, users giving up their licenses.
  • “Send your lawyers a video of another person in a suit talking about that topic”

U 2 can U2F Rob N ★

  • Existing devices are not terribly but better than nothing, usability sucks
  • Universal Two-Factor
    • Open Standard by FIDO alliance
    • USB, NFC, Bluetooth
    • Multiple server and host implimentations
    • One device multi-sites
    • Cloning protection
  • Interesting Examples
  • User experience: Login, press the button twice.
  • Under the hood a lot more complicated
    • Challenge from site, send must sign challenge (including website  url to prevent phishing site proxying)
    • Multiple keypairs for each website on device
    • Has a login counter on the device included in signature, so server can panic then counter gets out of sync from a cloned device
  • Attestation Certificate
    • Shared across model or production batch
  • Browserland
    • Javascript
    • Chrome-based support are good
    • Firefox via extension (Native “real soon now”)
    • Mobile works on Android + Chrome + Google Authenticator


by simon at January 17, 2017 01:17 AM

January 16, 2017

Google Infrastructure Security Design Overview

The post Google Infrastructure Security Design Overview appeared first on

This is quite a fascinating document highlighting everything (?) Google does to keep its infrastructure safe.

And to think we're still trying to get our users to generate random, unique, passphrases for every service.

Secure Boot Stack and Machine Identity

Google server machines use a variety of technologies to ensure that they are booting the correct software stack. We use cryptographic signatures over low-level components like the BIOS, bootloader, kernel, and base operating system image. These signatures can be validated during each boot or update. The components are all Google-controlled, built, and hardened. With each new generation of hardware we strive to continually improve security: for example, depending on the generation of server design, we root the trust of the boot chain in either a lockable firmware chip, a microcontroller running Google-written security code, or the above mentioned Google-designed security chip.

Each server machine in the data center has its own specific identity that can be tied to the hardware root of trust and the software with which the machine booted. This identity is used to authenticate API calls to and from low-level management services on the machine.

Source: Google Infrastructure Security Design Overview

The post Google Infrastructure Security Design Overview appeared first on

by Mattias Geniar at January 16, 2017 09:50 PM

Everything Sysadmin

RIP John Boris

John was active in the LOPSA community. I saw him at nearly every LOPSA-NJ meeting, where he was active in planning and hosting the meetings. He was also on the board of LOPSA (national) where he will be greatly missed.

John was also a football coach at the school where he worked in the IT department. It was very clear that his coaching skills were something he applied everywhere, including his helpfulness and mentoring at LOPSA.

I had a feeling that when I hugged him at the end of the January LOPSA meeting it might be the last time I saw him. He was recovering from bypass surgery and was looking worn. He was chipper and friendly as always. He was a good guy. Easy to get along with. He kept LOPSA-NJ and many other projects going.

John Boris passed away last night.

I'll miss him.

by Tom Limoncelli at January 16, 2017 04:00 PM

WordPress to get secure, cryptographic updates

The post WordPress to get secure, cryptographic updates appeared first on

Exciting work is being done with regards to the WordPress auto-update system that allows the WordPress team to sign each update.

That signature can be verified by each WordPress installation to guarantee you're installing the actual WordPress update and not something from a compromised server.

Compromising the WordPress Update Infrastructure

This work is being lead by security researcher Scott Arciszewski from Paragon Initiative, a long-time voice in the PHP security community. He's been warning about the potential dangers of the WordPress update infrastructure for a very long time.

Scott and I discussed it in the SysCast podcast about Application Security too.

Since WordPress 3.7, support has been added to auto-update WordPress installations in case critical vulnerabilities are discovered. I praised them for that -- I really love that feature. It requires 0 effort from the website maintainer.

But that obviously poses a threat, as Scott explains:

Currently, if an attacker can compromise, they can issue a fake WordPress update and gain access to every WordPress install on the Internet that has automatic updating enabled. We're two minutes to midnight here (we were one minute to midnight before the Wordfence team found that vulnerability).

Given WordPress's ubiquity, an attacker with control of 27% of websites on the Internet is a grave threat to the security of the rest of the Internet. I don't know how much infrastructure could withstand that level of DDoS.
#39309: Secure WordPress Against Infrastructure Attacks

Scott has already published several security articles, with Guide to Automatic Security Updates For PHP Developers arguably being the most important one for anyone designing and creating a CMS.

Just about every CMS, from Drupal to WordPress to Joomla, uses a weak update mechanisme: if an attacker manages to take control over the update server(s), there's no additional proof they need to have in order to issue new updates. This poses a real threat to the stability of the web..

Securing auto-updates

For WordPress, a federated authentication model is proposed.

It consists of 3 key areas, as Scott explains:

1. Notaries (WordPress blogs or other services that opt in to hosting/verifying the updates) will mirror a Merkle tree which contains (with timestamps and signatures):
--- Any new public keys
--- Any public key revocations
--- Cryptographic hashes of any core/extension updates

2. WordPress blogs will have a pool of notaries they trust explicitly. [...]

3. When an update is received from the server, after checking the signature against the WP core's public key, they will poll at least one trusted Notary [..]. The Notary will verify that the update exists and matches the checksum on file, and respond with a signed message containing:
--- The challenge nonce
--- The response timestamp
--- Whether or not the update was valid

This will be useful in the event that the's signing key is ever compromised by a sophisticated adversary: If they attempt to issue a silent, targeted update to a machine of interest, they cannot do so reliably [..].
#39309: Secure WordPress Against Infrastructure Attacks

This way, in order to compromise the update system, you need to trick the notaries too to accept the false update. It's no longer merely dependent on the update system itself, but uses a set of peers to validate each of those updates.

Show Me The Money Code

The first patches have already been proposed, it's now up to the WordPress security team to evaluate them and implement any concerns they might have: patch1 and patch2.

Most of the work comes from a sodium_compat PHP package that implements the features provided by libsodium, a modern and easy-to-use crypto library.

Source: #39309

Because WordPress supports PHP 5.2.4 and higher (this is an entirely different security threat to WordPress, but let's ignore it for now), a pure PHP implementation of libsodium is needed since the PHP binary extensions aren't supported that far back. The pecl/libsodium extension requires at least PHP 5.4 or higher.

Here's hoping the patches get accepted and can be used soon, as I'm pretty sure there are a lot of parties interested in getting access to the WordPress update infrastructure.

The post WordPress to get secure, cryptographic updates appeared first on

by Mattias Geniar at January 16, 2017 07:30 AM

Chris Siebenmann

Linux is terrible at handling IO to USB drives on my machine

Normally I don't do much with USB disks on my machine, either flash drives or regular hard drives. When I do, it's mostly to do bulk read or write things such as blanking a disk or writing an installer image to a flash drive, and I've learned the hard way to force direct IO through dd when I'm doing this kind of thing. Today, for reasons beyond the scope of this entry, I was copying a directory of files to a USB flash drive, using USB 3.0 for once.

This simple operation absolutely murdered the responsiveness of my machine. Even things as simple as moving windows around could stutter (and fvwm doesn't exactly do elaborate things for that), never mind doing anything like navigating somewhere in a browser or scrolling the window of my Twitter client. It wasn't CPU load, because ssh sessions to remote machines were perfectly responsive; instead it seemed that anything that might vaguely come near doing filesystem IO was extensively delayed.

(As usual, ionice was ineffective. I'm not really surprised, since the last time I looked it didn't do anything for software RAID arrays.)

While hitting my local filesystems with a heavy IO load will slow other things down, it doesn't do it to this extent, and I wasn't doing anything particularly IO-heavy in the first place (especially since the USB flash drive was not going particularly fast). I also tried out copying a few (big) files by hand with dd so I could force oflag=direct, and that was significantly better, so I'm pretty confident that it was the USB IO specifically that was the problem.

I don't know what the Linux kernel is doing here to gum up its works so much, and I don't know if it's general or specific to my hardware, but it's been like this for years and I wish it would get better. Right now I'm not feeling very optimistic about the prospects of a USB 3.0 external drive helping solve things like my home backup headaches.

(I took a look with vmstat to see if I could spot something like a high amount of CPU time in interrupt handlers, but as far as I could see the kernel was just sitting around waiting for IO all the time.)

PS: We have more modern Linux machines with USB 3.0 ports at work, so I suppose I should do some tests with one just to see. If this Linux failure is specific to my hardware, it adds some more momentum for a hardware upgrade (cf).

(This elaborates on some tweets of mine.)

by cks at January 16, 2017 06:33 AM

January 15, 2017

Chris Siebenmann

Link: Let's Stop Ascribing Meaning to Code Points

Manish Goregaokar's Let's Stop Ascribing Meaning to Code Points starts out with this:

I've seen misconceptions about Unicode crop up regularly in posts discussing it. One very common misconception I've seen is that code points have cross-language intrinsic meaning.

He goes on to explain the ways that this is dangerous and how tangled this area of Unicode is. I knew little bits of this already, but apparently combining characters are only the tip of the iceberg.

(via, and see also.)

by cks at January 15, 2017 09:55 PM

January 13, 2017

Errata Security

About that Giuliani website...

Rumors are that Trump is making Rudy Giuliani some sort of "cyberczar" in the new administration. Therefore, many in the cybersecurity scanned his website "" to see if it was actually secure from hackers. The results have been laughable, with out-of-date software, bad encryption, unnecessary services, and so on.

But here's the deal: it's not his website. He just contracted with some generic web designer to put up a simple page with just some basic content. It's there only because people expect if you have a business, you also have a website.

That website designer in turn contracted some basic VPS hosting service from Verio. It's a service Verio exited around March of 2016, judging by the archived page.

The Verio service promised "security-hardened server software" that they "continually update and patch". According to the security scans, this is a lie, as the software is all woefully out-of-date. According OS fingerprint, the FreeBSD image it uses is 10 years old. The security is exactly what you'd expect from a legacy hosting company that's shut down some old business.

You can probably break into Giuliani's server. I know this because other FreeBSD servers in the same data center have already been broken into, tagged by hackers, or are now serving viruses.

But that doesn't matter. There's nothing on Giuliani's server worth hacking. The drama over his security, while an amazing joke, is actually meaningless. All this tells us is that Verio/ is a crappy hosting provider, not that Giuliani has done anything wrong.

by Robert Graham ( at January 13, 2017 05:21 AM

January 10, 2017

Evaggelos Balaskas

Tools I use daily

post inspired from:


Operating System

I use Archlinux as my primary Operating System. I am currently running Archlinux (since 2009) in all my boxes (laptop/workpc/homepc/odroid-c1). In the data center, I have CentOS on the bare-metal, and CentOS in the VM(s). A windows VM exists for work purposes on my workpc.



The last few years I am running fluxbox but I used to work on xfce. Thunar (xfce-file browser) is my first and only choice and lilyterm as my terminal emulator. tmux as my multiplexer. I used to run gnu screen for a decade !

I use arand for desktop layout (sharing my screen to external monitor or the TV).


Disk / FileSystem

All my disks are encrypted and I use both ext4 and btrfs on my systems. I really like btrfs (subvolumes) and I use the raid-0 and raid-1 but no raid-5 or raid-6 yet. I also have LVM on my laptop as I can not change the ssd easy.



Mostly Thunderbird but I still use mutt when using a terminal or an ssh session.


Editor + IDE

Vim 99% of my time.

for short-time notes: mousepad and when feeling to use a GUI, I use geany.



Multiple Instances of firefox, chromium, firefox - Nightly, Tor Browser and vimprobable2. I used to run midori but I’ve dropped it. I also have multiple profiles on firefox !!! I keep private-mode or incognito, all of them via a socks proxy (even Tor-Browser) with remote DNS (when possible).




but when needed, smuxi or pidgin


Blog / Website

flatpress no database, static pages but dynamic framework written in PHP. Some custom code on it but I keep a separated (off-the-web) clone with my custom changes. Recently added Markdown support and some JavaScript for code highlighting etc.

I dont tend to write a lot, but I keep personal notes on drafts (unpublished). I also keep a (wackowiki) wiki as a personal online keeping-notes wiki on my domain.


Version Control

Mostly mercurial but also git . I have a personal hg server (via ssh) for my code, files, notes, etc etc



VLC only. For media and podcasts and mirage or feh for image display. gimp for image manipulation




I wake up, I make my double espresso at home and drink it on commuting to work. The 20min distance gives coffee enough time to wake my brain. When at work, I mostly rant for everything.

and alcohol when needed ;)



My fluxbox menu has less than 15 apps, I’ve put there only my daily-use programs and I try to keep distractions on my desktop as minimum as possible. I keep disable notifications to apps and I mostly work on full screen to minimize input from running apps.

Tag(s): tools

January 10, 2017 10:01 PM

Everything Sysadmin

How Stack Overflow plans to survive the next DNS attack

My coworker did a bang-up job on this blog post. It explains a lot about how DNS works, how the Dyn DDOS attack worked (we missed it because we don't use Dyn), and the changes we made so that we'll avoid similar attacks when they come.

How Stack Overflow plans to survive the next DNS attack

by Tom Limoncelli at January 10, 2017 03:00 PM

Errata Security

NAT is a firewall

NAT is a firewall. It's the most common firewall. It's the best firewall.

I thought I'd point this out because most security experts might disagree, pointing to some "textbook definition". This is wrong.

A "firewall" is anything that establishes a barrier between some internal (presumably trusted) network and the outside, public, and dangerous Internet where anybody can connect to you at any time. A NAT creates exactly that sort of barrier.

What other firewalls provide (the SPI packet filters) is the ability to block outbound connections, not just incoming connections. That's nice, but that's not a critical feature. Indeed, few organizations use firewalls that way, it just causes complaints when internal users cannot access Internet resources.

Another way of using firewalls is to specify connections between a DMZ and an internal network, such as a web server exposed to the Internet that needs a hole in the firewall to access an internal database. While not technically part of the NAT definition, it's a feature of all modern NATs. It's the only way to get some games to work, for example.

There's already more than 10-billion devices on the Internet, including homes with many devices, as well as most mobile phones. This means that NAT is the most common firewall. The reason hackers find it difficult hacking into iPhones is partly because they connect to the Internet through carrier-grade NAT. When hackers used "alpine" as the backdoor in Cydia, they still had to exploit it over local WiFi rather than the carrier network.

Not only is NAT the most common firewall, it's the best firewall. Simple SPI firewalls that don't translate addresses have an inherent hole in that they are "fail open". It's easy to apply the wrong firewall ruleset, either permanently, or just for moment. You see this on internal IDS, where for no reason there's suddenly a spike of attacks against internal machines because of a bad rule. Every large organization I've worked with can cite examples of this.

NAT, on the other hand, fails closed. Common mistakes shutdown access to the Internet rather than open up access from the Internet. The benefit is so compelling that organizations with lots of address space really need to give it up and move to private addressing instead.

The definition of firewall is malleable. At one time it included explicit and transparent proxies, for example, which were the most popular type. These days, many people think of only state packet inspection filters as the "true" firewall. I take the more expansive view of things.

The upshot is this: NAT is by definition a firewall. It's the most popular firewall. It's the best firewalling technology.

Note: Of course, no organization should use firewalls of any type. They break the "end-to-end" principle of the Internet, and thus should be banned by law.

by Robert Graham ( at January 10, 2017 05:22 AM

No, Yahoo! isn't changing its name

Trending on social media is how Yahoo is changing it's name to "Altaba" and CEO Marissa Mayer is stepping down. This is false.

What is happening instead is that everything we know of as "Yahoo" (including the brand name) is being sold to Verizon. The bits that are left are a skeleton company that holds stock in Alibaba and a few other companies. Since the brand was sold to Verizon, that investment company could no longer use it, so chose "Altaba". Since 83% of its investment is in Alibabi, "Altaba" makes sense. It's not like this new brand name means anything -- the skeleton investment company will be wound down in the next year, either as a special dividend to investors, sold off to Alibaba, or both.

Marissa Mayer is an operations CEO. Verizon didn't want her to run their newly acquired operations, since the entire point of buying them was to take the web operations in a new direction (though apparently she'll still work a bit with them through the transition). And of course she's not an appropriate CEO for an investment company. So she had no job left -- she made her own job disappear.

What happened today is an obvious consequence of Alibaba going IPO in September 2014. It meant that Yahoo's stake of 16% in Alibaba was now liquid. All told, the investment arm of Yahoo was worth $36-billion while the web operations (Mail, Fantasy, Tumblr, etc.) was worth only $5-billion.

In other words, Yahoo became a Wall Street mutual fund who inexplicably also offered web mail and cat videos.

Such a thing cannot exist. If Yahoo didn't act, shareholders would start suing the company to get their money back.That $36-billion in investments doesn't belong to Yahoo, it belongs to its shareholders. Thus, the moment the Alibaba IPO closed, Yahoo started planning on how to separate the investment arm from the web operations.

Yahoo had basically three choices.
  • The first choice is simply give the Alibaba (and other investment) shares as a one time dividend to Yahoo shareholders. 
  • A second choice is simply split the company in two, one of which has the investments, and the other the web operations. 
  • The third choice is to sell off the web operations to some chump like Verizon.

Obviously, Marissa Mayer took the third choice. Without a slushfund (the investment arm) to keep it solvent, Yahoo didn't feel it could run its operations profitably without integration with some other company. That meant it either had to buy a large company to integrate with Yahoo, or sell the Yahoo portion to some other large company.

Every company, especially Internet ones, have a legacy value. It's the amount of money you'll get from firing everyone, stop investing in the future, and just raking in year after year a stream of declining revenue. It's the fate of early Internet companies like Earthlink and Slashdot. It's like how I documented with Earthlink [*], which continues to offer email to subscribers, but spends only enough to keep the lights on, not even upgrading to the simplest of things like SSL.

Presumably, Verizon will try to make something of a few of the properties. Apparently, Yahoo's Fantasy sports stuff is popular, and will probably be rebranded as some new Verizon thing. Tumblr is already it's own brand name, independent of Yahoo, and thus will probably continue to exist as its own business unit.

One of the weird things is Yahoo Mail. It permanently bound to the "" domain, so you can't do much with the "Yahoo" brand without bringing Mail along with it. Though at this point, the "Yahoo" brand is pretty tarnished. There's not much new you can put under that brand anyway. I can't see how Verizon would want to invest in that brand at all -- just milk it for what it can over the coming years.

The investment company cannot long exist on its own. Investors want their money back, so they can make future investment decisions on their own. They don't want the company to make investment choices for them.

Think about when Yahoo made its initial $1-billion investment for 40% of Alibaba in 2005, it did not do so because it was a good "investment opportunity", but because Yahoo believed it was good strategic investment, such as providing an entry in the Chinese market, or providing an e-commerce arm to compete against eBay and Amazon. In other words, Yahoo didn't consider as a good way of investing its money, but a good way to create a strategic partnership -- one that just never materialized. From that point of view, the Alibaba investment was a failure.

In 2012, Marissa Mayer sold off 25% of Alibaba, netting $4-billion after taxes. She then lost all $4-billion on the web operations. That stake would be worth over $50-billion today. You can see the problem: companies with large slush funds just fritter them away keeping operations going. Marissa Mayer abused her position of trust, playing with money that belong to shareholders.

Thus, Altbaba isn't going to play with shareholder's money. It's a skeleton company, so there's no strategic value to investments. They can make no better investment choices than its shareholders can with their own money. Thus, the only purpose of the skeleton investment company is to return the money back to the shareholders. I suspect it'll choose the most tax efficient way of doing this, like selling the whole thing to Alibaba, which just exchanges the Altaba shares for Alibaba shares, with a 15% bonus representing the value of the other Altaba investments. Either way, if Altaba is still around a year from now, it's because it's board is skimming money that doesn't belong to them.

Key points:

  • Altaba is the name of the remaining skeleton investment company, the "Yahoo" brand was sold with the web operations to Verizon.
  • The name Altaba sucks because it's not a brand name that will stick around for a while -- the skeleton company is going to return all its money to its investors.
  • Yahoo had to spin off its investments -- there's no excuse for 90% of its market value to be investments and 10% in its web operations.
  • In particular, the money belongs to Yahoo's investors, not Yahoo the company. It's not some sort of slush fund Yahoo's executives could use. Yahoo couldn't use that money to keep its flailing web operations going, as Marissa Mayer was attempting to do.
  • Most of Yahoo's web operations will go the way of Earthlink and Slashdot, as Verizon milks the slowly declining revenue while making no new investments in it.

by Robert Graham ( at January 10, 2017 04:13 AM

January 09, 2017

That grumpy BSD guy

A New Year, a New Round of pop3 Gropers from China

They've got a list, and they're sticking to it. Do they even know or care it's my list of spamtraps?

Yes, the Chinese are at it again. Or rather, machines with IP addresses that belong in a small set of Chinese province networks have started a rather intense campaign of trying to access the pop3 mail retrieval protocol on a host in my care, after a longish interval of near-total inactivity.

This is the number of failed pop3 login attempts to my system per day so far in 2017:

January 1:        4
January 2:    145
January 3:      20
January 4:      51
January 5:      32
January 6:      36
January 7:  4036
January 8:  5956
January 9:  5769

Clearly, something happened on January 7th, and whatever started then has not stopped yet. On that day we see a large surge in failed pop3 logins, sustained over the next few days, and almost exclusively attempts at the username part of entries from my list of spamtrap addresses. Another notable feature of this sequence of attempts is that they come almost exclusively from a small set of Chinese networks.

The log of the failed attempts are in raw form here, while this spreadsheet summarises the log data in a format oriented to IP address and username pairs and attempts at each.  The spreadsheet also contains netblock information and the country or territory the range is registered to. (Note: when importing the .csv, please specify the "User name" colum as text, otherwise conversion magic may confuse matters)

The numbers for January 7th onwards would have been even higher had it not been for a few attempts to access accounts that actually exist, with my reaction to block (for 24 hours only) the entire netblock the whois info for the offending IP address. Some of those blocks were quite substantial. I've also taken the liberty of removing those entries with real usernames from the logs.

Now despite urging from friends to publish quickly, I've silently collected data for a few days (really just a continuation of the collecting that started with last year's episode described in the Chinese Hunting Chinese Over POP3 In Fjord Country article, which in turn contains links to the data that by now covers almost a full year).

Now disregarding the handful of real user IDs I've already removed from the data set, the only new user IDs we have seen this year are:


The rest were already in the spamtraps list, as user name parts. As you will have guessed, those two have been duly included there as well, with appended in order to form a somewhat believable spamtrap email address.

What, then can we expect to see over the next few days?

The progression so far has been proceeding from trap user names starting with 0, ascended through the numerics and have now (January 9) moved on to the early alphabetics. The list of spamtraps is just shy of 35,000 entries, and I assume the entries I see here come out of some larger corpus that our somewhat inept cyber-criminals use.

If you too are seeing larger than usual numbers of pop3 login failures and anything even vaguely resembling the patters of mischief described here, I would like to hear from you. If your logs follow a format somewhat resembling mine, it is most likely trivial to modify the scripts (in the same directories as the data) to extract data to the slightly more database- or spreadsheet-friendly CSV.

From my perch here it is difficult to determine whether the people responsible for the networks that feature prominently in the data are cooperating with the cybercriminals or whether they are simply victims of their own substandard security practices.

If you are in a position to shed light on that, I would like to hear from you, and I will do my best to protect your anonymity unless you specify otherwise.

In the meantime, expect the data, (both the full set starting in January 2016 and the 2017-only set) to be updated at frequent, quasi-random intervals.

If you need some background on what the spamtrap list is and some history of related incidents over the last few years, I can recommend reading these articles:

Hey, spammer! Here's a list for you! (July 2007) - a light introduction to greytrapping
Maintaining A Publicly Available Blacklist - Mechanisms And Principles (April 2013) - on greytrapping principles and practice
Effective Spam and Malware Countermeasures - Network Noise Reduction Using Free Tools (2008 - 2014) - a more thorought treatment of the whole spam and malware complex
Password Gropers Take the Spamtrap Bait (August 2014) - the first time I noticed pop3 logins for my spamtraps, and of course
Chinese Hunting Chinese Over POP3 In Fjord Country (August 2016) - about a previous bizarre episode involving Chinese networks and pop3 activity.

by Peter N. M. Hansteen ( at January 09, 2017 08:17 PM

Ansible-cmdb v1.19: Generate a host overview of Ansible facts.

I've just released ansible-cmdb v1.19. Ansible-cmdb takes the output of Ansible's fact gathering and converts it into a static HTML overview page containing system configuration information. It supports multiple templates (fancy html, txt, markdown, json and sql) and extending information gathered by Ansible with custom data.

This release includes the following bugfixes;:

  • Always show stack trace on error and include class name.
  • Exit with proper exit codes.
  • Exclude certain file extensions from consideration as inventories.
  • Improved error reporting and lookups of templates.
  • Improved error reporting when specifying inventories.

As always, packages are available for Debian, Ubuntu, Redhat, Centos and other systems. Get the new release from the Github releases page.

by admin at January 09, 2017 01:52 PM

Bexec: Execute script in buffer and display output in buffer. Version 0.10 released.

After almost a year of no releases, I've made a new release of Bexec today. It's a minor feature release that brings a new setting to Bexec: bexec_splitsize. This settings controls the default size of the output window. You can set it in your .vimrc as follows:

let g:bexec_splitsize=20

This will always make the output window 20 lines high.

It's been almost exactly ten years since the first version of Bexec was released. Version 0.1 was uploaded to on Januari 30th of 2007. That makes about one release per year on average for Bexec ;-) Perhaps it's time for v1.0 after all this time…

by admin at January 09, 2017 06:55 AM

Sarah Allen

the danger of a single story

Chimamanda Ngozi Adichie’s the danger of a single story illustrates how we are influenced by the stories we read and the stories we tell.

Chimamanda Ngozi Adichie speaking

She introduces the talk telling about how reading British and American children’s books influenced her own childish writings that featured foreign landscapes and experiences, rather than drawing from her own life. I remembered how my mom pointed out to me years later the drawings I had made when we lived in Saint Lucia. All my houses had chimneys, even though we lived in this very hot, tropical climate with no fireplaces.

She tells about her experience of negative bias where well-meaning people made assumptions about Africa. She also shares how she inadvertently made assumptions about others based on a single narrative that excluded the possibility of other attributes and experience.

It is impossible to talk about the single story without talking about power. There is a word, an Igbo word, that I think about whenever I think about the power structures of the world, and it is “nkali.” It’s a noun that loosely translates to “to be greater than another.” Like our economic and political worlds, stories too are defined by the principle of nkali: How they are told, who tells them, when they’re told, how many stories are told, are really dependent on power… Power is the ability not just to tell the story of another person, but to make it the definitive story of that person.”

It resonates with me how the negative stories flatten her experience. The creation of stereotypes by such storytelling is problematic not because they are entirely false, but because they are incomplete. “They make one story become the only story.”

“The consequence of the single story is this: It robs people of dignity. It makes our recognition of our equal humanity difficult. It emphasizes how we are different rather than how we are similar… when we reject the single story, when we realize that there is never a single story about any place, we regain a kind of paradise.”

The post the danger of a single story appeared first on the evolving ultrasaurus.

by sarah at January 09, 2017 12:06 AM

haproxy: restrict specific URLs to specific IP addresses

This snippet shows you how to use haproxy to restrict certain URLs to certain IP addresses. For example, to make sure your admin interface can only be accessed from your company IP address.

January 09, 2017 12:00 AM

January 08, 2017

Steve Kemp's Blog

Patching scp and other updates.

I use openssh every day, be it the ssh command for connecting to remote hosts, or the scp command for uploading/downloading files.

Once a day, or more, I forget that scp uses the non-obvious -P flag for specifying the port, not the -p flag that ssh uses.

Enough is enough. I shall not file a bug report against the Debian openssh-client page, because no doubt compatibility with both upstream, and other distributions, is important. But damnit I've had enough.

apt-get source openssh-client shows the appropriate code:

    fflag = tflag = 0;
    while ((ch = getopt(argc, argv, "dfl:prtvBCc:i:P:q12346S:o:F:")) != -1)
          switch (ch) {
            case 'P':
                    addargs(&remote_remote_args, "-p");
                    addargs(&remote_remote_args, "%s", optarg);
                    addargs(&args, "-p");
                    addargs(&args, "%s", optarg);
            case 'p':
                    pflag = 1;

Swapping those two flags around, and updating the format string appropriately, was sufficient to do the necessary.

In other news I've done some hardware development, using both Arduino boards and the WeMos D1-mini. I'm still at the stage where I'm flashing lights, and doing similarly trivial things:

I have more complex projects planned for the future, but these are on-hold until the appropriate parts are delivered:

  • MP3 playback.
  • Bluetooth-speakers.
  • Washing machine alarm.
  • LCD clock, with time set by NTP, and relay control.

Even with a few LEDs though I've had fun, for example writing a trivial binary display.

January 08, 2017 04:39 PM

January 07, 2017

Sarah Allen

what i’ve learned in the past year

At first I thought I hadn’t learned anything new, just the same lessons I keep learning every year. Then I realized that I’ve learned new techniques that let me apply what I’ve previously understood in ways that work better.

Lessons I need to keep learning:

  1. Everything is about the people. My relationships with other humans are more important to me than anything else. Since much of my life is spent making software, I think a lot about how this applies to my work. Software is really made of people: people’s ideas, conflicts, errors, understanding or misunderstanding the needs of others or the limits of machines. We need to work well with other people in order to do everything, or at least to do the things that really matter. The so-called soft skills are hard.
  2. Why is more important than what. If we agree on doing something, but we’re doing it for different reasons, it typically doesn’t have happy outcomes for anyone. For that same reason, working with people who share your values is incredibly powerful. Our values influence our decision-making, sometimes so fundamentally that we don’t realize that we are making a decision at all.

This year I combined these lessons into a very different approach to finding my way in the world. Part of the reason I can do this is because I know a lot of people and have grown comfortable outside of my comfort zone. Or rather, I have discovered that what I used to think of as boundaries create a false comfort, and have gained experience in creating boundaries in my interactions which create safety in new experiences.

Find your people.

When I started doing business development for my own consulting company, I realized that there are different ways of doing business that co-exist in our capitalist economy. There are business people who are competing with each other to win, where in order to win, someone else has to lose. Success is gained at someone else’s expense. There are a lot of successful people who work that way, but its not how I work.

In my very first startup, I came up with a very simple formula for understanding this business of making software: if you make something that people need, especially if it’s something they need to do their work, they will be happy to give you some of their money.

My idea of a successful business transaction is when I am happy to do the work because I get paid to create something wonderful in collaboration with smart, interesting people. Then what I get paid feels like a lot of money, plus I’m gaining experience that I value. If we negotiate well and set expectations effectively, then the customer feels like it wasn’t that much money relative to the value of this awesome thing we created together. Of course, every business deal didn’t work out that way even with the best of intentions, but I learned to notice which people weren’t even trying to create that situation.

I applied this idea earlier this year when looking for my new job. I talked to people who I had really enjoyed working with, who I felt were doing interesting things. Those people introduced me to other people. I didn’t have time to talk to everyone I wanted to meet or reconnect with, so I followed my heart and met with people who I most wanted to talk with. It felt a little random at times, yet it was quite intentional. I prioritized the companies with the people who told stories about their work that inspired me, or made me laugh, where it seem like some things would be easy and most of the difficult things would be fun.

I spent more time talking with people who were honest, whose truths reflected my own, who caused me think and reflect. I also prioritized people who also wanted to work with me. That sounds kind of obvious in a job search, but I mean something very specific. Of all the people who would like a person with my skills on their team, there is a smaller group who actually want to work with me, with all my quirks and diverse interests, where seemingly unrelated talents and skills are valued as part of the team.

I ended up taking a job at Google, inside this huge company where there are lots of different kinds of people, I found a community of like-minded folk. Most of those people don’t even know each other, but they help me stay connected to my own values and help me navigate a strange new world. I stay connected with other industry colleagues through Bridge Foundry, a wide network of civic hackers, and small gatherings of friends. Every week I try to have lunch or coffee with someone awesome who I don’t work with day-to-day. Allowing myself to care about the kinds of people I work and staying connected with a wider group of wonderful people with has created a profound, enriching effect on my day-to-day life.

Be nice to the other humans.

I used to feel like I had to figure out who the good and the bad people were, or the good people and the other people who needed to be enlightened, but just didn’t know it yet. I had to try really hard not to be judgmental, and it was really hard to be nice to people who didn’t meet my standards, except, more often than not, I didn’t actually meet my own standards. Despite my best efforts, I kept screwing up.

I got excited about agile development and lean startup where you are supposed to fail fast and learn, but we can’t A/B test relationships. I realized that sometimes talking about a thing is a whole message unto itself. You are saying “I think our spending time on this is the most important thing we each could be doing right now.” Of course, sometimes it is, but often not at all.

If something might be a misunderstanding, it might be not that important for either of us. If I’m not going to be working with you or might not even see you again this year or ever, and you aren’t actually hurting anyone, maybe I should just be nice. Maybe I should give you the benefit of the doubt, think of the best possible reason you might have said that or focus on something else you said that was much more interesting. Then when we meet again, if its important and still relevant, we can figure it out. More likely, things will have changed, and we will have changed.

We invent ourselves in each moment. These shared experiences are precious moments of ourlives. It may seem obvious or inconsequential, but being nice, genuinely nice, makes the day just a little bit brighter and leaves the way open for new opportunities.

The post what i’ve learned in the past year appeared first on the evolving ultrasaurus.

by sarah at January 07, 2017 06:31 PM

January 06, 2017

Sean's IT Blog

Horizon 7.0 Part 12–Understanding Horizon Remote Access

When you decouple the user from the physical hardware that sits on their desk, you provide new opportunities to change the way they work because they are no longer tethered to their desk. If you can provide secure remote access to their desktop, they are no longer tied to their VPN connection or corporate laptop.

Horizon View provides a secure method for granting users access to their desktops from anywhere with an Internet connection on any device without needing a VPN connection.  Now that a desktop pool has been set up and desktops are provisioned, it’s time to set up that remote access.

The Security Server

The View Security Server is VMware’s original method of addressing remote access.  This component of the Horizon View environment contains a subset of the Connection Server components, and it is designed to sit in a DMZ and act as a gateway for Horizon View Clients.  It’s essentially a reverse proxy for your View environment.

Each Security Server that is deployed needs a corresponding Connection Server, and they are paired during the installation process.  Because the Security Server is an optional component, each Connection Server is not required to have one, and a Connection Server cannot be paired to more than one Security Server.

Each Security Server also needs a static IP address.  If it is externally facing, it will need to have a publicly addressable static IP.  This IP address does not need to be configured on the server’s network card as both Static 1:1 NAT and PAT work with Horizon View.

Since the Security Server is built on a subset of Connection Server components, it requires a Windows Server-based operating system.  This may require putting Windows servers into a DMZ network, and this can present some security and management challenges.

Security Server Alternatives

There are two alternatives for providing remote access to Horizon environments if you don’t want to place Windows servers into a DMZ environment.  These two alternatives are the Horizon Access Point, a hardened purpose-built remote access appliance for Horizon and Airwatch, and the F5 Access Policy Manager for Horizon. 

The Horizon Access Point was officially released for Horizon environments with Horizon 6.2.2, and it has received new features and improvements with every major and minor Horizon release since.  In addition to being a Security Server replacement, it can also act as a reverse proxy for VIDM and as endpoint for Airwatch Tunnels to connect on-premises services with a cloud-hosted Airwatch environment.  The Access Point is designed to be disposable.  When the Access Point needs an upgrade, settings change (such as a certificate replacement), or breaks, the appliance is meant to be discarded and a new one deployed in its place.  The Access Point also has no management interface.  It does have a REST API that can be used to view configuration details and monitor the number of connections that are connecting through the Access Point.

The F5 Access Policy Manager is a feature of the F5 Application Delivery Controller.  Access Policy Manager provides context-aware secure remote access to applications and other resources.  One of the feature of APM is a Horizon Proxy.  The Horizon Proxy can authenticate users to the Horizon environment and handle both PCoIP and Blast connections.  F5 APM is configured using a Horizon iApp Rule – a template with all of the F5 rules required for Horizon and a graphical interface for configuring it to your particular environment.  The APM feature is licensed separately from other F5 features, and there is an additional cost for F5 APM licensing.

The table below outlines the features of the Security Server, Access Point, and F5 APM.


Security Server

Access Point F5 Access Policy Module
Platform Windows Server Virtual Appliance Physical or virtual Appliance
Protocol Support PCoIP, Blast Extreme PCoIP, Blast Extreme PCoIP, Blast Extreme
Interaction with Horizon Paired with Connection Servers HTTPS connection to load-balanced Connection Servers HTTPS connection to pool of connection servers
Two-Factor Auth Support Handled by Connection Servers RSA, Radius-Based RSA, Radius-Based
Deployment Method Manual Scripted GUI-based

Security Server Firewall Ports

In order to enable remote access, a few ports need to be opened on any firewalls that sit between the network where the Security Server has been deployed and the Internet.  If the server is deployed into a  DMZ, the firewall will also need to allow traffic between the Security Server and the Connection Server.

The rules that are required on the front-end, Internet-facing firewall are:

  • HTTP – TCP 80 In
  • HTTPS – TCP 443 In
  • HTTPS – UDP 443 In (for Blast Extreme UDP Connections)
  • HTTPS – TCP 8443 both directions (if Blast is used with the Security Server)
  • PCoIP – TCP 4172 In, UDP 4172 both directions

Backend firewall rules between the remote access solution and the Horizon Connection Servers and desktops depends on the remote access solution being configured.  The following table outlines the ports that need to be opened between the DMZ and internal networks.

Port Protocol Zone Notes
443 TCP HTTPS DMZ –Connection Servers Access Point only
4172 TCP/UDP PCoIP DMZ to Virtual Desktop Subnets  
22443 TCP/UDP Blast DMZ to Virtual Desktop Subnets  
9427 TCP Client Drive Redirection/MMR DMZ to Virtual Desktop Subnets  
500 UDP IPSec DMZ to Connection Servers Security Server Only
4500 UDP NAT-T ISAKMP DMZ to Connection Servers Security Server Only

If you are deploying your Security Servers in a DMZ configuration with a back-end firewall, you need to configure your firewall to allow IPSEC traffic to the Connection Servers.  These rules depend on whether network address translation is used between the DMZ and Internal network.  For more information on the rules that need to be enabled, please see this VMware KB article.

Note: If you’re using application-aware firewalls like Palo Alto Networks devices, make sure that any application protocols required by Horizon View aren’t blocked between the DMZ and Internal network.  Also, updates to the application signatures or the PCoIP protocol may impact users’ access to virtual desktops.

So Which Should I Use?

The million dollar question when deploying a brand new Horizon environment is: which remote access method should I use?  The answer is “whichever one fits your needs the best.”  When designing remote access solutions for Horizon, it is important to understand the tradeoffs of using the different options and to evaluate options during the pilot phase of the project.

If possible, I would recommend staying away from the Security Server now that there are other options for remote access.  I don’t recommend this for many clients because the Access Point has feature parity with the Security Server, and it avoids the security and management hassles of deploying Windows Servers into an organization’s DMZ network.

by seanpmassey at January 06, 2017 07:56 PM


The Ultimate Game Boy Talk [video]

This is the video recording of “The Ultimate Game Boy Talk” at 33C3.

I will post the slides in Apple Keynote format later.

If you enjoyed this, you might also like my talks

by Michael Steil at January 06, 2017 01:34 PM


My Favourite Books in 2016

I’ve planned to read 36 books in 2016 and managed to hit that number a few hours before the NY! The best of those 36 books are listed below.

Business, Management and Leadership

Considering the new role I’ve started in January 2016 (first-time CTO of a growing startup company), my reading last year was heavily geared towards business, management and leadership topics. Here are my favourite books in this category:

  • The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers” — in my opinion, a must-read book for anybody interested in starting a company or already building one. A treasure trove of great advice for startup founders on building and managing their teams.
  • Crossing the Chasm: Marketing and Selling High-Tech Products to Mainstream Customers — the author explains why so many companies, that find an initial product-market fit, subsequently fail to grow into leaders of their respective markets and often die a slow and painful death. The concept of a chasm and, especially, the idea of the whole product were very powerful for my understanding of what I felt in many companies I worked for — mainstream customers could not use your product unless they are provided with a minimum set of components and services to solve their problem. Very important read for leaders of modern SaaS companies, especially for API/platform enterprises.
  • Turn the Ship Around!: A True Story of Turning Followers into Leaders” — an inspiring story of a navy captain transforming one of the worst-performing crews in the fleet into a perfectly functioning team by pushing control down to individual team members.
  • The Score Takes Care of Itself” — inspiring story of one of the best sport team transformations and the man behind it, legendary coach Bill Walsh.


Few more books I found very interesting:

  • The Collapse of Parenting: How We Hurt Our Kids When We Treat Them Like Grown-Ups” — maybe it is just a confirmation bias, but I absolutely loved this book. The author focuses on a few serious problems in today’s parenting and the resulting decline in the achievement and psychological health of American children. He finally managed put into words something that was bothering me for 10 years since moving to Canada. Now that I became a parent and would have to raise a child in this environment, I was glad to hear that I wasn’t crazy not to agree with the approach that is being pushed on modern parents by American society.
  • Being Mortal: Medicine and What Matters in the End” — one of my favourite authors, Atul Gawande, explores the current state of end of life care in the USA, Canada, and Western Europe. Terrifying at first, the book makes you consider your own mortality and think about the choices you are bound to make eventually for yourself and, potentially, for your close family members.
  • Sapiens: A Brief History of Humankind” — a captivating overview of our history as human species: from 70,000 years ago until the 20th century: how we evolved, how we affected other species on the planet and how did we end up where we are today. A long, but very interesting read!
  • The Road To Sparta: Retracing the Ancient Battle and Epic Run that Inspired the World’s Greatest Foot Race” — fascinating story of Dean Karnazes (one of the most famous ultra-distance runners in the world) and his exploration of the legend of the Marathon. Highly recommended to anybody interested in running.
  • Catastrophic Care: How American Health Care Killed My Father — and How We Can Fix It” — very detailed overview of what is broken in US healthcare today. Even if you don’t have anything to do with US healthcare market, the book is a great collection of stories about side-effects of what initially looked like great ideas, but ended up screwing the system even further.


I was always a huge fan of sci-fi fiction and this past year I have discovered a few real gems that ended up on my all time favourite list:

  • Remembrance of Earth’s Past (aka The Three Body Problem)” series by a Chinese author Liu Cixin — huge universe, highly-detailed and powerful characters, timeline spanning centuries — you can find all of it here. But on top of the standard components of a good space opera, there is this previously unknown to me layer of Chinese culture, language, philosophy.
    This trilogy has become an instant classic for me and is in the top-10 of my all time favourites next to Asimov’s “Foundation” and Peter F. Hamilton’s “Void”.
  • Everything from Niel Gainman! Up until this year when I got exposed to his writing, I’ve never realized how much pleasure one could get from reading prose. I’m not sure how he does it, but if he were to publish a book of obituaries or classifieds, I’d be willing to read that too — I enjoyed his English so much! Favourite books so far: “The Graveyard Book” and “The Ocean at the End of the Lane”.

I hope you enjoyed this overview of the best books I’ve read in 2016. Let me know you liked it!

by Oleksiy Kovyrin at January 06, 2017 01:28 AM

January 05, 2017

Everything Sysadmin

NYCDevOps meetup is re-starting!

Hey NYC-area folks!

The NYC DevOps meetup is springing back to life! Our next meeting will be Tuesday, January 17, 2017 from 6pm-7:30pm. The meeting is at the Stack Overflow NYC headquarters near the financial district. For complete details, visit

From the announcement:

Please join us on January 17th from 6:00 - 7:30 PM at Stack Exchange for our first annual DevOps Mixer. Our goal is to re-engage with our members for an inaugural meet and greet with our new team of organizers, awesome community members, and of course there will be refreshments! Come socialize with us and talk about your experiences, what's new, what you're working on and what you would like to see from the NYC DevOps Meetup.

We're also looking for members of the local community to participate in future meetups by giving some great talks about things that you're working on, and participate on interactive panels. Come with ideas on topics you'd like to hear about!

Finally, please provide your feedback on how we can best serve the NYC DevOps Community members via this survey:

Hope to see you there!


by Tom Limoncelli at January 05, 2017 03:00 PM

Geek and Artist - Tech

iframe-based dashboards don’t work in 2017

At $current_employer (unlike $previous_employer where all these problems were sorted out), we have great huge TVs in every room but not consistently useful usage of them. I love seeing big, beautiful dashboards and KPIs visualised everywhere but right now, we just don’t have that in place. No matter, this is part of my mission to improve engineering practices here and I’m happy to tackle it.

The last time I felt I had to do this was back in about 2013. My team was fairly small at 2-3 people including myself, and there was no company-wide dashboarding solution in place. The list of commercial and open source solutions was much smaller than it is today. We ended up using a Mac Mini (initially with Safari, later Chrome) and some tab rotation extension to do the job of rotating between various hard-coded HTML pages I had crafted by hand, which aggregated numerous Graphite graphs into a fixed table structure. Hm.

While there are many solutions to displaying dashboards, collecting and storing the data, actually hosting the infrastructure and what drives the TV still seems a bit fiddly. You could try using the TV’s built-in web browser if it is a smart TV (low-powered, usually no saved settings if you turn the TV off, not enough memory, questionable HTML5 support), Chromecast (not independent from another computer), Raspberry Pi (low-powered, not enough memory), or some other small form-factor PC. The ultimate solution will probably require some common infrastructure to be deployed along the lines of Concerto, which I’ve used before but don’t want to wait for that to be set up yet.

The simplest possible solution is to host a small static HTML file on the machine, load it in the browser and have that page rotate through a hard-coded set of URLs by loading them in an iframe. I came up with this code in a few minutes and hoped it would work:

<body style="margin:0px">
  <iframe id="frame"></iframe>
  <script type="text/javascript">
    function rotateDashboard(urls, refreshPeriod) {
      var frame = document.getElementById("frame");
      frame.src = urls[0];

      // Put the current URL on the back of the queue and set the next refresh
      setTimeout(rotateDashboard, refreshPeriod * 1000, urls, refreshPeriod);

    // Set up iframe
    var frame = document.getElementById("frame");
    frame.height = screen.height;
    frame.width = screen.width; = "none";
    frame.seamless = true;

    // Set up metadata
    var xhr = new XMLHttpRequest();
    xhr.onload = function(e) {
      var json = JSON.parse(xhr.responseText);
      var refresh = json.refresh;
      var urls = json.urls;

      rotateDashboard(urls, refresh);
    };"GET", "");

For the first couple of locally hosted dashboards and another static website for testing, it worked, but for the first Librato-based dashboard it immediately failed due to the X-Frame-Options header in the response being set to DENY. Not being a frontend-savvy person, I’d only vaguely known of this option but here it actually blocked the entire concept from working.

So, TL;DR – you can’t host rotating dashboards in an iframe, given the security settings most browsers and respectable websites obey in 2017. This is probably well-known to anyone who has done any reasonable amount of web coding in their lives, but to a primarily backend/infrastructure person it was a surprise. So, the Earliest Testable Product in this case needs to be with a tab rotation extension in the browser. You might argue this is simpler, but I was looking forward to maintaining configuration in a flexible manner as you can see in the script above. In any case, by the end of today I’ll have such a system running and the team will start to enjoy the benefits of immediate visibility of critical operational data and KPIs!

by oliver at January 05, 2017 09:16 AM

January 04, 2017

Anton Chuvakin - Security Warrior

Annual Blog Round-Up – 2016

Here is my annual "Security Warrior" blog round-up of top 10 popular posts/topics in 2016. Note that my current Gartner blog is where you go for my recent blogging, all of the content below predates 2011.

  1. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009. Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not.
  2. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on developing security monitoring use cases here!
  3. Simple Log Review Checklist Released!” is often at the top of this list – the checklist is still a very useful tool for many people. “On Free Log Management Tools” is a companion to the checklist (updated version)
  4. My classic PCI DSS Log Review series is always hot! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ in 2017 as well), useful for building log review processes and procedures , whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book (out in its 4th edition!
  5. “SIEM Resourcing or How Much the Friggin’ Thing Would REALLY Cost Me?” is a quick framework for assessing the SIEM project (well, a program, really) costs at an organization (a lot more details on this here in this paper).
  6. Top 10 Criteria for a SIEM?” came from one of my last projects I did when running my SIEM consulting firm in 2009-2011 (for my recent work on evaluating SIEM tools, see this document)
  7. “How to Write an OK SIEM RFP?” (from 2010) contains Anton’s least hated SIEM RFP writing tips (I don’t have any favorite tips since I hate the RFP process)
  8. “An Open Letter to Android or “Android, You Are Shit!”” is an epic rant about my six year long (so far) relationship with Android mobile devices (no spoilers here – go and read it).
  9. “A Myth of An Expert Generalist” is a fun rant on what I think it means to be “a security expert” today; it argues that you must specialize within security to really be called an expert.
  10. Another old checklist, “Log Management Tool Selection Checklist Out!”  holds a top spot  – it can be used to compare log management tools during the tool selection process or even formal RFP process. But let me warn you – this is from 2010.

Disclaimer: all this content was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing.  For my current security blogging, go here.

Also see my past monthly and annual “Top Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015.

by Anton Chuvakin ( at January 04, 2017 07:11 PM

January 03, 2017

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – December 2016

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts/topics this month:
  1. “An Open Letter to Android or “Android, You Are Shit!”” is an epic rant about my six year long (so far) relationship with Android mobile devices (no spoilers here – go and read it).
  2. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on developing security monitoring use cases here!
  3. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009. Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 
  4. Simple Log Review Checklist Released!” is often at the top of this list – this aging checklist is still a very useful tool for many people. “On Free Log Management Tools” (also aged a bit by now) is a companion to the checklist (updated version)
  5. My classic PCI DSS Log Review series is always popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ as well), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book (now in its 4th edition!)
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has about 5X of the traffic of this blog]: 
Current research on security analytics and UBA / UEBA:
Recent research on deception:
Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015.

Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Previous post in this endless series:

by Anton Chuvakin ( at January 03, 2017 03:46 PM

January 02, 2017

The Lone Sysadmin

Standards, to and with Resolve

As the holiday season has progressed I’ve spent a bunch of time in the car, traveling three hours at a crack to see friends and family in various parts of Midwestern USA. Much of that travel has been alone, my family having decided to ensconce themselves with my in-laws for the full duration of the […]

The post Standards, to and with Resolve appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at January 02, 2017 05:45 AM

January 01, 2017

The DFIR Hierarchy of Needs & Critical Security Controls

As you weigh how best to improve your organization's digital forensics and incident response (DFIR) capabilities heading into 2017, consider Matt Swann's Incident Response Hierarchy of Needs. Likely, at some point in your career (or therapy 😉) you've heard reference to Maslow's Hierarchy of Needs. In summary, Maslow's terms,  physiological, safety, belongingness & love, esteem, self-actualization, and self-transcendence, describe a pattern that human motivations generally move through, a pattern that is well represented in the form of a pyramid.
Matt has made great use of this model to describe an Incident Response Hierarchy of Needs, through which your DFIR methods should move. I argue that his powerful description of capabilities extends to the whole of DFIR rather than response alone. From Matt's Github, "the Incident Response Hierarchy describes the capabilities that organizations must build to defend their business assets. Bottom capabilities are prerequisites for successful execution of the capabilities above them:"

The Incident Response Hierarchy of Needs
"The capabilities may also be organized into plateaus or phases that organizations may experience as they develop these capabilities:"

Hierarchy plateaus or phases
As visualizations, these representations really do speak for themselves, and I applaud Matt's fine work. I would like to propose that a body of references and controls may be of use to you in achieving this hierarchy to its utmost. I also welcome your feedback and contributions regarding how to achieve each of these needs and phases. Feel free to submit controls, tools, and tactics you have or would deploy to be successful in these endeavors; I'll post your submission along with your preferred social media handle.
Aspects of the Center for Internet Security Critical Security Controls Version 6.1 (CIS CSC) can be mapped to each of Matt's hierarchical entities and phases. Below I offer one control and one tool to support each entry. Note that there is a level of subjectivity to these mappings and tooling, but the intent is to help you adopt this thinking and achieve this agenda. Following is an example for each one, starting from the bottom of the pyramid.

 INVENTORY - Can you name the assets you are defending?  
Critical Security Control #1: Inventory of Authorized and Unauthorized Devices
Family: System
Control: 1.4     
"Maintain an asset inventory of all systems connected to the network and the network devices themselves, recording at least the network addresses, machine name(s), purpose of each system, an asset owner responsible for each device, and the department associated with each device. The inventory should include every system that has an Internet protocol (IP) address on the network, including but not limited to desktops, laptops, servers, network equipment (routers, switches, firewalls, etc.), printers, storage area networks, Voice Over-IP telephones, multi-homed addresses, virtual addresses, etc.  The asset inventory created must also include data on whether the device is a portable and/or personal device. Devices such as mobile phones, tablets, laptops, and other portable electronic devices that store or process data must be identified, regardless of whether they are attached to the organization’s network." 
Tool option:
Spiceworks Inventory

 TELEMETRY - Do you have visibility across your assets?  
Critical Security Control #6: Maintenance, Monitoring, and Analysis of Audit Logs
Family: System
Control: 6.6      "Deploy a SIEM (Security Information and Event Management) or log analytic tools for log aggregation and consolidation from multiple machines and for log correlation and analysis.  Using the SIEM tool, system administrators and security personnel should devise profiles of common events from given systems so that they can tune detection to focus on unusual activity, avoid false positives, more rapidly identify anomalies, and prevent overwhelming analysts with insignificant alerts."
Tool option:  
AlienVault OSSIM

 DETECTION - Can you detect unauthorized actvity? 
Critical Security Control #8: Malware Defenses
Family: System
Control: 8.1
"Employ automated tools to continuously monitor workstations, servers, and mobile devices with anti-virus, anti-spyware, personal firewalls, and host-based IPS functionality. All malware detection events should be sent to enterprise anti-malware administration tools and event log servers."
Tool option:
OSSEC Open Source HIDS SECurity

 TRIAGE - Can you accurately classify detection results? 
Critical Security Control #4: Continuous Vulnerability Assessment and Remediation
Family: System
Control: 4.3
"Correlate event logs with information from vulnerability scans to fulfill two goals. First, personnel should verify that the activity of the regular vulnerability scanning tools is itself logged. Second, personnel should be able to correlate attack detection events with prior vulnerability scanning results to determine whether the given exploit was used against a target known to be vulnerable."
Tool option:

 THREATS - Who are your adversaries? What are their capabilities? 
Critical Security Control #19: Incident Response and Management
Family: Application
Control: 19.7
"Conduct periodic incident scenario sessions for personnel associated with the incident handling team to ensure that they understand current threats and risks, as well as their responsibilities in supporting the incident handling team."
Tool option:
Security Incident Response Testing To Meet Audit Requirements

 BEHAVIORS - Can you detect adversary activity within your environment? 
Critical Security Control #5: Controlled Use of Administrative Privileges
Family: System
Control: 5.1
"Minimize administrative privileges and only use administrative accounts when they are required.  Implement focused auditing on the use of administrative privileged functions and monitor for anomalous behavior."
Tool option: 
Local Administrator Password Solution (LAPS)

 HUNT - Can you detect an adversary that is already embedded? 
Critical Security Control #6: Maintenance, Monitoring, and Analysis of Audit Logs       
Family: System
Control: 6.4
"Have security personnel and/or system administrators run biweekly reports that identify anomalies in logs. They should then actively review the anomalies, documenting their findings."
Tool option:
GRR Rapid Response

 TRACK - During an intrusion, can you observe adversary activity in real time? 
Critical Security Control #12: Boundary Defense
Family: Network
Control: 12.10
"To help identify covert channels exfiltrating data through a firewall, configure the built-in firewall session tracking mechanisms included in many commercial firewalls to identify TCP sessions that last an unusually long time for the given organization and firewall device, alerting personnel about the source and destination addresses associated with these long sessions."
Tool option:

 ACT - Can you deploy countermeasures to evict and recover? 
Critical Security Control #20: Penetration Tests and Red Team Exercises       
Family: Application
Control: 20.3
"Perform periodic Red Team exercises to test organizational readiness to identify and stop attacks or to respond quickly and effectively."
Tool option:
Red vs Blue - PowerSploit vs PowerForensics

 Can you collaborate with trusted parties to disrupt adversary campaigns? 
Critical Security Control #19: Incident Response and Management       
Family: Application
Control: 19.5
"Assemble and maintain information on third-party contact information to be used to report a security incident (e.g., maintain an e-mail address of or have a web page"
Tool option:

I've mapped the hierarchy to the controls in CIS CSC 6.1 spreadsheet, again based on my experience and perspective, yours may differ, but consider similar activity.

CIS CSC with IR Hierarchy mappings

My full mapping of Matt's Incident Response Hierarchy of Needs in the
CIS CSC 6.1 spreadsheet is available here:

I truly hope you familiarize yourself with Matt's Incident Response Hierarchy of Needs and find ways to implement, validate, and improve your capabilities accordingly. Consider that the controls and tools mentioned here are but a starting point and that you have many other options available to you. I look forward to hearing from you regarding your preferred tactics and tools as well. Kudos to Matt for framing this essential discussion so distinctly.

by Russ McRee ( at January 01, 2017 04:00 AM

December 31, 2016

Steve Kemp's Blog

So I'm gonna start doing arduino-things

Since I've got a few weeks off I've decided I need to find a project, or two, to occupy me. Happily the baby is settling in well, mostly he sleeps for 4-5 hours, then eats, before the cycle repeats. It could have been so much worse.

My plan is to start exploring Arduino-related projects. It has been years since I touched hardware, with the exception of building a new PC for myself every 12-48 months.

There are a few "starter kits" you can buy, consisting of a board, and some discrete components such as a bunch of buttons, an LCD-output screen, some sensors (pressure, water, tilt), etc.

There are also some nifty little pre-cooked components you can buy such as:

The appeal of the former is that I can get the hang of marrying hardware with software, and the appeal of the latter is that the whole thing is pre-built, so I don't need to worry about anything complex. Looking over similar builds people have made, the process is more akin to building with Lego than real hardware-assembling.

So, for the next few weeks my plan is to :

  • Explore the various sensors, and tutorials, via the starter-kit.
  • Wire the MP3-playback device to a wireless D1-mini-board.
    • Which will allow me to listen to (static) music stored on an SD-card.
    • And sending "next", "previous", "play", "volume-up", etc, via a mobile.

The end result should be that I will be able to listen to music in my living room. Albeit in a constrained fashion (if I want to change the music I'll have to swap out the files on the SD-card). But it's something that's vaguely useful, and something that I think is within my capability, even as a beginner.

I'm actually not sure what else I could usefully do, but I figured I could probably wire up a vibration sensor to another wireless board. The device can sit on the top of my washing machine:

  • If vibration is sensed move into the "washing is on" state.
    • If vibration stops after a few minutes move into the "washing machine done" state.
      • Send a HTTP GET-request, which will trigger an SMS/similar.

There's probably more to it than that, but I expect that a simple vibration sensor will be sufficient to allow me to get an alert of some kind when the washing machine is ready to be emptied - and I don't need to poke inside the guts of the washing machine, nor hang reed-switches off the door, etc.

Anyway the only downside to my plan is that no doubt shipping the toys from AliExpress will take 2-4 weeks. Oops.

December 31, 2016 07:11 PM

The Geekess

Update on Sentiment Analysis of FOSS communities

One of my goals with my new open source project, FOSS Heartbeat, has been to measure the overall sentiment of communication in open source communities. Are the communities welcoming and friendly, hostile, or neutral? Does the bulk of positive or negative sentiment come from core contributors or outsiders? In order to make this analysis scale across multiple open source communities with years of logs, I needed to be able to train an algorithm to recognize the sentiment or tone of technical conversation.

How can machine learning recognize human language sentiment?

One of the projects I’ve been using is the Stanford CoreNLP library, an open source Natural Language Processing (NLP) project. The Stanford CoreNLP takes a set of training sentences (manually marked so that each word and each combined phrase has a sentiment) and it trains a neural network to recognize the sentiment.

The problem with any form of artificial intelligence is that the input into the machine is always biased in some way. For the Stanford CoreNLP, their default sentiment model was trained on movie reviews. That means, for example, that the default sentiment model thinks “Christian” is a very positive word, whereas in an open source project that’s probably someone’s name. The default sentiment model also consistently marks any sentence expressing a neutral technical opinion as having a negative tone. Most people leaving movie reviews either hate or love the movie, and people are unlikely to leave a neutral review analyzing the technical merits of the special effects. Thus, it makes sense that a sentiment model trained on movie reviews would classify technical opinions as negative.

Since the Stanford CoreNLP default sentiment model doesn’t work well on technical conversation, I’ve been creating a new set of sentiment training data that only uses sentences from open source projects. That means that I have to manually modify the sentiment of words and phrases in thousands of sentences that I feed into the new sentiment model. Yikes!

As of today, the Stanford CoreNLP default sentiment model has ~8,000 sentences in their training file. I currently have ~1,200 sentences. While my model isn’t as consistent as the Stanford CoreNLP, it is better at recognizing neutral and positive tone in technical sentences. If you’re interested in the technical details (e.g. specificity, recall, false positives and the like), you can take a look at the new sentiment model’s stats. This blog post will attempt to present the results without diving into guided machine learning jargon.

Default vs New Models On Positive Tone

Let’s take a look at an example of a positive code review experience. The left column is from the default sentiment model in Stanford CoreNLP, which was trained on movie reviews. The right column is from the new sentiment model I’ve been training. The colors of the sentence encode what the two models think the overall tone of the sentence is:

  • Very positive
  • Positive
  • Neutral
  • Negative
  • Very negative

Hey @1Niels 🙂 is there a particular reason for calling it Emoji Code?

I think the earlier guide called it emoji name.

A few examples here would help, as well as explaining that the pop-up menu shows the first five emojis whose names contain the letters typed.

(I’m sure you have a better way of explaining this than me :-).

@arpith I called them Emoji code because that’s what they’re called on Slack’s emoji guide and more commonly only other websites as well.

I think I will probably change the section name from Emoji Code to Using emoji codes and I’ll include your suggestion in the last step.

Thanks for the feedback!

Hey @1Niels 🙂 is there a particular reason for calling it Emoji Code?

I think the earlier guide called it emoji name.

A few examples here would help, as well as explaining that the pop-up menu shows the first five emojis whose names contain the letters typed.

(I’m sure you have a better way of explaining this than me :-).

@arpith I called them Emoji code because that’s what they’re called on Slack’s emoji guide and more commonly only other websites as well.

I think I will probably change the section name from Emoji Code to Using emoji codes and I’ll include your suggestion in the last step.

Thanks for the feedback!

Default vs New Models On Positive Tone

For the default model trained on movie reviews, it rated 4 out of 7 of the sentences as negative and 1 out of 7 sentences as positive. As you can see, the default sentiment model that was trained on movie reviews tends to classify neutral technical talk as having a negative tone, including sentences like “I called them Emoji code because that’s what they’re called on Slack’s emoji guide and more commonly only other websites as well.” It did recognize the sentence “Thanks for the feedback!” as positive, which is good.

For the new model trained on comments from open source projects, it rated 1 sentence as negative, 2 as positive, and 1 as very positive. Most of the positive tone of this example comes from the use of smiley faces, which I’ve been careful to train the new model to recognize. Additionally, I’ve been teaching it that exclamation points ending a sentence that is overall positive shift the tone to very positive. I’m pleased to see it pick up on those subtleties.

Default vs New Models On Neutral Tone

Let’s have a look at a neutral tone code review example. Again, the sentence sentiment color key is:

  • Very positive
  • Positive
  • Neutral
  • Negative
  • Very negative

This seems to check resolvers nested up to a fixed level, rather than checking resolvers and namespaces nested to an arbitrary depth.

I think a inline-code is more appropriate here, something like “URL namespace {} is not unique, you may not be able to reverse all URLs in this namespace”.

Errors prevent management commands from running, which is a bit severe for this case.

One of these should have an explicit instance namespace other than inline-code, otherwise the nested namespaces are not unique.

Please document the check in inline-code.

There’s a list of URL system checks at the end.

This seems to check resolvers nested up to a fixed level, rather than checking resolvers and namespaces nested to an arbitrary depth.

I think a inline-code is more appropriate here, something like “URL namespace {} is not unique, you may not be able to reverse all URLs in this namespace”.

Errors prevent management commands from running, which is a bit severe for this case.

One of these should have an explicit instance namespace other than inline-code, otherwise the nested namespaces are not unique.

Please document the check in inline-code.

There’s a list of URL system checks at the end.

Default vs New Models On Neutral Tone

Again, the default sentiment model trained on movie reviews classifies neutral review as negative, ranking 5 out of 6 sentences as negative.

The new model trained on open source communication is a bit mixed on this example, marking 1 sentence as positive and 1 negative, out of 6 sentences. Still, 4 out of 6 sentences were correctly marked as neutral, which is pretty good, given the new model has a training set that is 8 times smaller than the movie review set.

Default vs New Models On Negative Tone

Let’s take a look at a negative example. Please note that this is not a community that I am involved in, and I don’t know anyone from that community. I found this particular example because I searched for “code of conduct”. Note that the behavior displayed on the thread caused the initial contributor to offer to abandon their pull request. A project outsider stated they would recommend their employer not use the project because of the behavior. Another project member came along to ask for people to be more friendly. So quite a number of people thought this behavior was problematic.

Again, the sentiment color code is:

  • Very positive
  • Positive
  • Neutral
  • Negative
  • Very negative

Dude, you must be kidding everyone.

What dawned on you – that for a project to be successful and useful it needs confirmed userbase – was crystal clear to others years ago.

Your “hard working” is little comparing to what other people have been doing for years.

Get humbler, Mr. Arrogant.

If you find this project great, figure out that it is so because other people worked on it before.

Learn what they did and how.

But first learn Python, as pointed above.

Then keep working hard.

And make sure the project stays great after you applied your hands to it.

Dude, you must be kidding everyone.

What dawned on you – that for a project to be successful and useful it needs confirmed userbase – was crystal clear to others years ago.

Your “hard working” is little comparing to what other people have been doing for years.

Get humbler, Mr. Arrogant.

If you find this project great, figure out that it is so because other people worked on it before.

Learn what they did and how.

But first learn Python, as pointed above.

Then keep working hard.

And make sure the project stays great after you applied your hands to it.

Default vs New Models On Negative Tone

For the default model trained on movie reviews, it classifies 4 out of 9 sentences as negative and 2 as positive. The new model classifies 2 out of 9 sentences as negative and 2 as positive. In short, it needs more work.

It’s unsurprising that the new model doesn’t currently recognize negative sentiment very well right now, since I’ve been focusing on making sure it can recognize positive sentiment and neutral talk. The training set currently has 110 negative sentences out of 1205 sentences total. I simply need more negative examples, and they’re hard to find because many subtle personal attacks, insults, and slights don’t use curse words. If you look at the example above, there’s no good search terms, aside from the word arrogant, even though the sentences are still put-downs that create an us-vs-them mentality. Despite not using slurs or curse words, many people found the thread problematic.

The best way I’ve settled on to find negative sentiment examples is to look for “communication meta words” or people talking about communication style. My current list of search terms includes words like “friendlier”, “flippant”, “abrasive”, and similar. Some search words like “aggressive” yield too many false positives, because people talk about things like “aggressive optimization”. Once I’ve found a thread that contains those words, I’ll read through it and find the comments that caused the people to ask for a different communication style. Of course, this only works for communities that want to be welcoming. For other communities, searching for the word “attitude” seems to yield useful examples.

Still, it’s a lot of manual labor to identify problematic threads and fish out the negative sentences that are in those threads. I’ll be continuing to make progress on improving the model to recognize negative sentiment, but it would help if people could post links to negative sentiment examples on the FOSS Heartbeat github issue or drop me an email.

Visualizing Sentiment

Although the sentiment model isn’t perfect, I’ve added visualization for the sentiment of several communities on FOSS Heartbeat, including 24pullrequests, Dreamwidth, systemd, elm, fsharp, and opal.

The x-axis is the date. I used the number of neutral comments in an issue or pull request as the y-axis coordinate, with the error bars indicating the number of positive and negative comments. If the comment had two times the number of negative comments as positive comments, it was marked as a negative thread. If the comment had two times the number of positive comments than negative comments, it was marked as positive. If neither sentiment won, and more than 80% of the comments were neutral, it was marked as neutral. Otherwise the issue or pull request was marked as mixed sentiment.

Here’s an example:


The sentiment graph is from the 24pullrequests repository. It’s a ruby website that encourages programmers to gift code to open source projects during the 24 days in December before Christmas. One of the open source projects you can contribute to is the 24 pull requests site itself (isn’t that meta!). During the year, you’ll see the site admins filing help-wanted enhancements to update the software that runs the website or tweak a small feature. They’re usually closed within a day without a whole lot of back and forth between the main contributors. The mid-year contributions show up as the neutral, low-comment dots throughout the year. When the 24 pull request site admins do receive a gift of code to the website by a new contributor as part of the 24 pull requests period, they’re quite thankful, which you can see reflected in the many positive comments around December and January.

Another interesting example to look at is negative sentiment in the opal community:


That large spike with 1207 neutral comments, 197 positive comments, and 441 negative comments is the opal community issue to add a code of conduct. Being able to quickly see which threads are turning into flamewars would be helpful to community managers and maintainers who have been ignoring the issue tracker to get some coding done. Once the sentiment model is better trained, I would love to analyze whether communities become more positive or more neutral after a Code of Conduct is put in place. Tying that data to whether more or less newcomers participate after a Code of Conduct is in place may be interesting as well.

There are a lot of real-world problems that sentiment analysis, participation data, and a bit of psychology could help us identify. One common social problem is burnout, which is characterized by an increased workload (stages 1 & 2), working at odd hours (stage 3), and an increase in negative sentiment (stage 6). We have participation data, comment timestamps, and sentiment for those comments, so we would only need some examples of burnout to identify the pattern. By being aware of the burnout stages of our collaborators, we could intervene early to help them avoid a spiral into depression.

A more corporate focused interest might be to identify issues where their key customers express frustration and anger, and focus their developers on fixing the squeaky wheel. If FOSS Heartbeat were extended to analyze comments on mailing lists, slack, discourse, or mattersmost, companies could get a general idea of the sentiment of customers after a new software release. Companies can also use the participation and data about who is merging code to figure out which projects or parts of their code are not being well-maintained, and assign additional help, as the exercism community did.

Another topic of interest to communities hoping to grow their developer base would be identifying the key factors that cause newcomers to become more active contributors to a project. Is it a positive welcome? A mentor suggesting a newcomer tackle a medium-sized issue by tagging them? Does adding documentation about a particularly confusing area cause more newcomers to submit pull requests to that area of code? Does code review from a particularly friendly person cause newcomers to want to come back? Or maybe code review lag causes them to drop off?

These are the kinds of people-centric community questions I would love to answer by using FOSS Heartbeat. I would like to thank Mozilla for sponsoring the project for the last three months. If you have additional questions you’d love to see FOSS Heartbeat answer, I’m available for contract work through Otter Tech. If you’re thankful about the work I’ve put in so far, you can support me through my patreon.

What open source community question would you like to see FOSS Heartbeat tackle? Feel free to leave a comment.

by sarah at December 31, 2016 01:10 AM

December 29, 2016

Anton Chuvakin - Security Warrior

An Open Letter to Android or “Android, You Are Shit!”

Dear Android:

I know you are an operating system and probably cannot (yet?) read on your own. However, recent events compelled me to write this letter to you; an idea for it literally came to me in a dream.

You see, I have carried an Android phone in my pocket since 2010, for almost six years. First Sony Experia X10 (eventually running a venerable Android 2.3.7), then another phone and then finally a Google Nexus 4 and now Google Nexus 5X (sporting Android 7.1.1). At some point, I traded an iPad for a Google Nexus 9. A [sort of] Android Amazon Fire is my living room Android. I have convinced my wife to start using Android as well and she became a fan too. This represents a multi-year love affair with you, dear Android.

In fact, dear Android, I often had to defend you from packs from rabid Apple fanboys, generally with good results - I either won or we had a draw. Over the years, I had to defend my mobile technology choices from many people: “No, it is NOT an iPhone, it is a Nexus”, “Yes, I chose Android because I like it more than iPhone, not because it is cheaper”, “Yes, I think Google Now is way more useful than Siri”, etc, etc. I’ve counter-attacked with arguments about “closed Apple ecosystem”, “one stupid button” and “overpriced devices.”   As a person who follows information technology, I am aware of Android many strengths such as better background processes and multi-tasking, security improvements, flexible user interface, Google Now integration, etc.

However, as I am writing this, my beloved Nexus 5X is no longer with me. In fact, recent events have triggered some soul-searching and ultimately this letter. While doing my soul-searching, I realized that my love affair with you, Android, has some strong dysfunctional notes. You see, I think I always suspected that you are shit.

Over the years, I’ve been using my Android devices carefully and thoughtfully – I never rooted them, never sideloaded apps [well, not to my main personal phone], and I even tried to minimize my use of non-Google applications, etc.  However, as I recall my experiences with Android over the last six years, I am saddened to report that you, Android, never really worked quite right.

In fact, I distilled my reasons to calling you “shit” to one key point: I have never really trusted you, because you have never worked reliably enough to earn such trust.

Indeed, my Sony phone will sometimes crash and reboot, or freeze (“battery out” was the only cure). I of course explained it by “growing pains of Android, the new mobile OS”…after all you were just in v.2., practically a baby. My Nexus 4 used to crash and shut down as well; apps will often drain the battery to zero without any warning.  Furthermore, even nowadays, my Google Nexus 9 tablet (running Android 7.1.1) will occasionally just shut down out of the blue – I just had to restart it earlier today.  A few days before my Nexus 5X untimely death - just 1 year and 9 days after purchase, the phone rebooted when I launched a Camera app. Such random reboots and crashes were not common with my Nexus 5X, but they did happen periodically.  And then finally, my Nexus 5X entered an endless reboot loop a few days after the 7.1.1 OTA update and now has to be replaced. No troubleshooting steps helped.

OK, Google, you want to blame the hardware, perhaps? My experiences over the last 6 years sap the energy from this argument. I used the hardware from 3 different makers, all running Android, all having stability problems.

You see, Android, I don’t care about improved malware protection, faster UI and about the fact that you are “really Linux.” I don’t care about your growing market share.  An OS that cannot stay up is shit OS. And, you, my dear OS friend, is shit.

In fact, as my employer gave me an iPhone (first 4S and now 5S), a peculiar pattern of behavior developed in my life: if I absolutely, positively had to call an Uber on a dark and stormy night, I will stash my work iPhone in my back pocket, just in case. If I have to show a boarding pass to a permanently angry TSA agent, I will print it or use an iPhone. In fact, I was not even aware of this “if it has to happen – use iPhone” pattern until my wife asked me about why I was printing another boarding pass and I said “Ok, I guess I can use an iPhone for that” – and so I realized that I just won’t trust my Android device with this.

Dear Android, you may be a full-featured OS now, but you are just not mission critical. In fact, you are the opposite of that – you are iffy. And the only reason for why a version SEVEN (not version TWO with growing pains, mind you) will not achieve this reliability is obvious to me – you are shit.

Android, I’ve never really trusted you and I don’t trust you now. I’ve lived with you since your version 2.1 to a current 7.1.1. The only way you can still have "growing pains" after so many years is that you are a shit OS.

Despite all that, dear Android, I will take one more chance with you. When my Google Nexus 5X is repaired and then hopefully continues working for a while, I will stick to using you. But, sorry, no promises beyond that point!

Respectfully ... but distrustfully,

Dr. Anton Chuvakin
(as a consumer, NOT as a technology analyst!)

by Anton Chuvakin ( at December 29, 2016 08:09 PM

December 28, 2016


Check Out My TeePublic Designs

Over the years fans of this blog have asked if I would consider selling merchandise with the TaoSecurity logo. When I taught classes for TaoSecurity from 2005-2007 I designed T-shirts for my students and provided them as part of the registration package. This weekend I decided to exercise my creative side by uploading some designs to TeePublic.

TeePublic offers clothing along with mugs, phone cases, notebooks, and other items.

Two are based on the TaoSecurity logo. One includes the entire logo, along with the company motto of "The Way of Digital Security." The second is a close-up of the TaoSecurity S, which is a modified yin-yang symbol.

Two other designs are inspired by network security monitoring. One is a 1989-era map of MilNet, the United States' military network. This image is found in many places on the Internet, and I used it previously in my classes. The second is a close-up of a switch and router from the TaoSecurity labs. I used this equipment to create packet captures for teaching network security monitoring.

I hope you like these designs. I am particularly partial to the TaoSecurity Logo mug, the TaoSecurity S Logo Mug, and TaoSecurity S Logo t-shirt.

Let me know what you think via comments here.

Update 28 Dec 2016:

Check out the MilNet mug!

by Richard Bejtlich ( at December 28, 2016 04:50 PM

December 26, 2016

Steve Kemp's Blog

I finally made something worthwhile.

So for once I made something useful.


Oiva Adam Kemp.

Happy Christmas, if you believe in that kind of thing.

December 26, 2016 09:34 AM


Choria Playbooks

Today I am very pleased to release something I’ve been thinking about for years and actively working on since August.

After many POCs and thrown away attempts at this over the years I am finally releasing a Playbook system that lets you run work flows on your MCollective network – it can integrate with a near endless set of remote services in addition to your MCollective to create a multi service playbook system.

This is a early release with only a few integrations but I think it’s already useful and I’m looking for feedback and integrations to build this into something really powerful for the Puppet eco system.

The full docs can be found on the Choria Website, but below you can get some details.


Today playbooks are basic YAML files. Eventually I envision a Service to execute playbooks on your behalf, but today you just run them in your shell, so they are pure data.

Playbooks have a basic flow that is more or less like this:

  1. Discover named Node Sets
  2. Validate the named Node Sets meet expectations such as reachability and versions of software available on them
  3. Run a pre_book task list that lets you do prep work
  4. Run the main tasks task list where you do your work, around every task certain hook lists can be run
  5. Run either the on_success or on_fail task list for notification of Slacks etc
  6. Run the post_book task list for cleanups etc

Today a task can be a MCollective request, a shell script or a Slack notification. I imagine this list will grow huge, I am thinking you will want to ping webhooks, or interact with Razor to provision machines and wait for them to finish building, run Terraform or make EC2 API requests. This list of potential integrations is endless and you can use any task in any of the above task lists.

A Node Set is simply a named set of nodes, in MCollective that would be certnames of nodes but the playbook system itself is not limited to that. Today Node Sets can be resolved from MCollective Discovery, PQL Queries (PuppetDB), YAML files with groups of nodes in them or a shell command. Again the list of integrations that make sense here is huge. I imagine querying PE or Foreman for node groups, querying etcd or Consul for service members. Talking to random REST services that return node lists or DB queries. Imagine using Terraform outputs as Node Set sources or EC2 API queries.

In cases where you wish to manage nodes via MCollective but you are using a cached discovery source you can ask node sets to be tested for reachability over MCollective. And node sets that need certain MCollective agents can express this desire as SemVer version ranges and the valid network state will be asserted before any playbook is run.

Playbooks do not have a pseudo programming language in them though I am not against the idea. I do not anticipate YAML to be the end format of playbooks but it’s good enough for today.


I’ll show an example here of what I think you will be able to achieve using these Playbooks.

Here we have a web stack and we want to do Blue/Green deploys against it, sub clusters have a fact cluster. The deploy process for a cluster is:

  • Gather input from the user such as cluster to deploy and revision of the app to deploy
  • Discover the Haproxy node using Node Set discovery from PQL queries
  • Discover the Web Servers in a particular cluster using Node Set discovery from PQL queries
  • Verify the Haproxy nodes and Web Servers are reachable and running the versions of agents we need
  • Upgrade the specific web tier using:
    1. Tell the ops room on slack we are about to upgrade the cluster
    2. Disable puppet on the webservers
    3. Wait for any running puppet runs to stop
    4. Disable the nodes on a particular haproxy backend
    5. Upgrade the apps on the servers using appmgr#upgrade to the input revision
    6. Do up to 10 NRPE checks post upgrade with 30 seconds between checks to ensure the load average is GREEN, you’d use a better check here something app specific
    7. Enable the nodes in haproxy once NRPE checks pass
    8. Fetch and display the status of the deployed app – like what version is there now
    9. Enable Puppet

Should the task list all FAIL we run these tasks:

  1. Call a webhook on AWS Lambda
  2. Tell the ops room on slack
  3. Run a whole other playbook called deploy_failure_handler with the same parameters

Should the task list PASS we run these tasks:

  1. Call a webhook on AWS Lambda
  2. Tell the ops room on slack

This example and sample playbooks etc can be found on the Choria Site.


Above is the eventual goal. Today the major missing piece here that I think MCollective needs to be extended with the ability for Agent plugins to deliver a Macro plugin. A macro might be something like Puppet.wait_till_idle(:timeout => 600), this would be something you call after disabling the nodes and you want to be sure Puppet is making no more changes, you can see the workflow above needs this.

There is no such Macros today, I will add a stop gap solution as a task that waits for a certain condition but adding Macros to MCollective is high on my todo list.

Other than that it works, there is no web service yet so you run them from the CLI and the integrations listed above is all that exist, they are quite easy to write so hoping some early adopters will either give me ideas or send PRs!

This is available today if you upgrade to version 0.0.12 of the ripienaar-mcollective_choria module.

See the Choria Website for much more details on this feature and a detailed roadmap.

UPDATE: Since posting this blog I had some time and added: Terraform Node Sets, ability to create GET and POST Webhook requests and the much needed ability to assert and wait for remote state.

by R.I. Pienaar at December 26, 2016 09:06 AM

December 25, 2016

System Administration Advent Calendar

Day 25 - Building a Team CLI with Python: One Alternative to ChatOps

Written by: Jan Ivar Beddari (@beddari)
Edited by: Nicholas Valler (@nvaller)


ChatOps is a great idea. Done right, it creates a well-defined collaborative
space where the barriers to entry are low and sharing improvements is quick.
Because of the immediate gains in speed and ease, ChatOps implementations have
a tendency to outgrow their original constraints. If this happens, the amount
of information and interrupts a team member is expected to filter and process
might become unmanageable. To further complicate the issue, reaching that limit
is a personal experience. Some might be fine with continuously monitoring three
dashboards and five chat rooms, and still get their work done. Others are more
sensitive and perhaps ending up fighting feelings of guilt or incompetence.

Being sufficiently explicit about what and when information reaches team
members takes time to get right. For this reason, I consider shared filtering
to be an inherent attribute of ChatOps, and a very challenging problem to
solve. As humans think and reason differently given the same input, building
and encouraging collaboration around a visible ‘robot’ perhaps isn’t the best

Defining the Team CLI

As an engineer, taking one step back, what alternative approaches exist that
would bring a lot of the same gains as the ChatOps pattern? We want it to be
less intrusive and not as tied to communication, hopefully increasing the
attention and value given to actual human interaction in chat rooms. To me, one
possible answer is to provide a team centric command line interface. This is
a traditional UNIX-like command line tool to run in a terminal window,
installed across all team members environments. Doing this, we shift our focus
from sharing a centralized tool to sharing a decentralized one. In a
decentralized model, there is an (apparent) extra effort needed to signal or
interrupt the rest of the team. This makes the operation more conscious, which
is a large win.

With a distributed model, where each team member operates in their own context,
a shared cli gives the opportunity to streamline work environments beyond the
capabilities of a chatbot API.

Having decided that this is something we’d like to try, we continue defining a
requirements list:

  • Command line UX similar to existing tools
  • Simple to update and maintain
  • Possible to extend very easily

There’s nothing special or clever about these three requirements. Simplicity is
the non-listed primary goal, using what experience we have to try getting
something working quickly. To further develop these ideas we’ll break down the
list and try to pinpoint some choices we’re making.

Command line UX similar to existing tools

Ever tried sharing a folder full of scripts using git? Scripts doesn’t really
need docs and reading git commits everyone can follow along with updates to the
folder, right? No. It just does not work. Shared tooling needs constraints.
Just pushing /usr/local/bin into git will leave people frustrated at the lack
of coherency. As the cognitive load forces people into forking their own
versions of each tool or script, any gains you were aiming for sharing them are

To overcome this we need standards. It doesn’t have to involve much work as we
already mostly agree what a good cli UX is - something similar to well-known
tools we already use. Thus we should be able to quickly set some rules and move

  • A single top level command tcli is the main entry point of our tool
  • All sub-commands are modules organized semantically using one of the two
    following syntax definitions:

    tcli module verb arguments
    tcli module subject verb arguments

  • Use of options is not defined but every module must implement --help

Unlike a folder of freeform scripts, this is a strict standard. But even so the
standard is easy to understand and reason about. It’s purpose is to create just
enough order and consistency to make sharing and reuse within our team

Simple to update and maintain

Arguably - also a part of the UX - are updates and maintenance. A distributed
tool shared across a team needs to be super simple to maintain and update. As a
guideline, anything more involved than running a single command would most
likely be off-putting. Having the update process stay out of any critical usage
paths is equally important. We can’t rely on a tool that blocks to check a
remote API for updates in the middle of a run. That would our most valued
expectation - simplicity. To solve this with a minimal amount of code, we could
reuse some established external mechanism to do update checks.

  • Updates should be as simple as possible, ideally git pull-like.
  • Don’t break expectations by doing calls over the network, shell out to
    package managers or similar.
  • Don’t force updates, stay out of any critical paths.

Possible to extend very easily

Extending the tool should be as easy as possible and is crucial to its long
term success and value. Typically there’s a large amount of hidden specialist
knowledge in teams. Using a collaborative command line tool could help share
that knowledge if the barrier to entry is sufficiently low. In practice, this
means that the main tool must be able to discover and run a wide variety of
extensions or plugins delivered using different methods, even across language
platforms. A great example of this is how it is possible to extend git with
custom sub-commands just by naming them git-my-command and placing them in
your path.

Another interesting generic extension point to consider is running Docker
as plugin modules in our tool. There’s a massive amount of tooling
already packaged that we’d be able to reuse with little effort. Just be sure to
maintain your own hub of canonical images from a secure source if you are doing
this for work.

Our final bullet point list defining goals for extensions:

  • The native plugin interface must be as simple as possible
  • Plugins should be discovered at runtime
  • Language and platform independent external plugins is a first class use case

Summoning a Python skeleton

Having done some thinking to define what we want to achieve, it’s time to start
writing some code. But why Python? What about Ruby, or Golang? The answer is
disappointingly simple: for the sake of building a pluggable cli tool, it does
not matter much what language we use. Choose the one that feels most
comfortable and start building. Due to our design choice to be able to plug
anything, reimplementing the top command layer in a different language later
would not be hard.

So off we go using Python. Anyone having spent time with it would probably
recognize some of the projects listed on the site, all of
them highly valued with great documentation available. When I learned that it
also hosts a cli library called Click, I was intrigued by its description:

“Click is a Python package for creating beautiful command line interfaces in a
composable way with as little code as necessary.”

Sounds perfect for our needs, right? Again, documentation is great as it
doesn’t assume anything and provide ample examples. Let’s try to get ‘hello
tcli’ working!

Hello tcli!

The first thing we’ll need is a working Python dev environment. That could mean
using a virtualenv, a tool and method used for separating libraries and
Python runtimes. If just starting out you could run [virtualenvwrapper] which
further simplifies managing these envs. Of course you could also just skip all
this and go with using Vagrant, Docker or some other environment, which will be
just fine. If you need help with this step, please ask!

Let’s initialize a project, here using virtualenvwrapper:

mkvirtualenv tcli
mkdir -p ~/sysadvent/tcli/tcli
cd ~/sysadvent/tcli
git init

Then we’ll create the three files that is our skeleton implementation. First
our main function cli() that defines our topmost command:


import click
def cli():
    """tcli is a modular command line tool wrapping and simplifying common
    team related tasks."""

Next an empty file to mark the tcli sub-directory as
containing Python packages:

touch tcli/

Last we’ll add a file that describes our Python package and its dependencies:

from setuptools import setup, find_packages


The resulting file structure should look like this:

tree ~/sysadvent/
└── tcli
    └── tcli

That’s all we need for our ‘hello tcli’ implementation. We’ll install our newly
crafted Python package as being editable - this just means we’ll be able to
modify its code in-place without having to rerun pip:

pip install --editable $PWD

pip will read our file and first install the minimal needed
dependencies listed in the install_requires array. You might know another
mechanism for specifying Python deps using requirements.txt which we will not
use here. Last it installs a wrapper executable named tcli pointing to our
cli() function inside It does this using the configuration values
found under entry_points, which are documented in the [Python Packaging User

Be warned that Python packaging and distribution is a large and sometimes
painful subject. Outside internal dev environments I highly recommend
simplifying your life by using fpm.

That should be all, if the stars aligned correctly we’re now ready for the
inaugural tcli run in our shell. It will show a help message and exit:

(tcli) beddari@mio:~/sysadvent/tcli$ tcli
Usage: tcli [OPTIONS] COMMAND [ARGS]...

  tcli is a modular command line tool wrapping and simplifying common team
  related tasks.

  --help  Show this message and exit.

Not bad!

Adding commands

As seen above, the only thing we can do so far is specify the --help option,
which is also done by default when no arguments are given. Going back to our
design, remember that we decided to allow only two specific UX semantics in our
command syntax. Add the following code below the cli() function in
def christmas():
    """This is the christmas module."""

@click.option(&apos--count&apos, default=1, help=&aposnumber of greetings&apos)
def greet(count, name):
    for x in range(count):
        click.echo(&aposMerry Christmas %s!&apos % name)

At this point, we should treat the @sysadvent
team to the number of greetings we think they deserve:

tcli christmas greet --count 3 "@sysadvent team"

The keys to understanding what is going on here are the and
@christmas.command() lines: greet() is a command belonging to the
christmas group which in turn belongs to our top level click group. The
Click library uses decorators–a common python pattern–to achieve this.
Spending some hours with the Click documentation we should now be able to
write quite complex command line tools, using minimal Python boilerplate code.

In our design, we defined goals for how we want to be able to extend our
command line tool, and that is where we’ll go next.

Plugging it together

The Click library is quite popular and there’s a large number of
third party extensions available. One such plugin is click-plugins, which
we’ll use to make it possible to extend our main command line script. In Python
terms, plugins can be separate packages that we’ll be able to discover and load
via setuptools entry_points. In non-Python terms this means we’ll be able to
build a plugin using a separate codebase and have it publish itself as
available for the main script.

We want to make it possible for external Python code to register at the
module level of the UX semantics we defined earlier. To make our main tcli
script dynamically look for registered plugins at runtime we’ll need to modify
it a little:

The first 9 lines of tcli/ should now look like this:

from pkg_resources import iter_entry_points

import click
from click_plugins import with_plugins

def cli():

Next, we’ll need to add click-plugins to the install_requires array in our file. Having done that, we reinstall our project using the same
command originally used:

pip install --editable $PWD

Reinstalling is needed here because we’re changing not only code, also the
Python package setup and dependencies.

To test if our new plugin interface is working, clone and install the example
tcli-oncall project:

cd ~/sysadvent/
git clone
cd tcli-oncall
pip install --editable $PWD

After installing, we have some new example dummy commands and code to play

tcli oncall take "a bath"

Take a look at the and tcli_oncall/ files in this project
to see how it works.

There’s bash in my Python!

The plugin interface we defined above obviously only works for native Python
code. An important goal for us is however to integrate and run any executable
as part of our cli as long as it is useful and follows the rules we set. In
order to do that, we’ll replicate how git extensions work to add commands
that appear as if they were built-in.

We create a new file in our tcli project add add the following code (adapted
from this gist) to it:


import os
import re
import itertools
from stat import S_IMODE, S_ISREG, ST_MODE

def is_executable_posix(path):
    """Whether the file is executable.
    Based on from stdlib

        st = os.stat(path)
    except os.error:
        return None

    isregfile = S_ISREG(st[ST_MODE])
    isexemode = (S_IMODE(st[ST_MODE]) & 0111)
    return bool(isregfile and isexemode)

def canonical_path(path):
    return os.path.realpath(os.path.normcase(path))

The header imports some modules we’ll need, and next follows two helper
functions. The first checks if a given path is an executable file, the second
normalizes paths by resolving any symlinks in them.

Next we’ll add a function to the same file that uses these two helpers to
search through all directories in our PATH for executables matching a regex
pattern. The function returns a list of pairs of plugin names and executables
we’ll shortly be adding as modules in our tool:

def find_plugin_executables(pattern):
    filepred = re.compile(pattern).search
    filter_files = lambda files: itertools.ifilter(filepred, files)
    is_executable = is_executable_posix

    seen = set()
    plugins = []
    for dirpath in os.environ.get(&aposPATH&apos, &apos&apos).split(os.pathsep):
        if os.path.isdir(dirpath):
            rp = canonical_path(dirpath)
            if rp in seen:

            for filename in filter_files(os.listdir(dirpath)):
                path = os.path.join(dirpath, filename)
                isexe = is_executable(path)

                if isexe:
                    cmd = os.path.basename(path)
                    name =, cmd).group(1)
                    plugins.append((name, cmd))
    return plugins

Back in our main, add another function and a loop that iterates
through the executables we’ve found to tie this together:


import tcli.utils
from subprocess import call

def add_exec_plugin(name, cmd):
    @cli.command(name=name, context_settings=dict(
    @click.argument(&aposcmd_args&apos, nargs=-1, type=click.UNPROCESSED)
    def exec_plugin(cmd_args):
        """Discovered exec module plugin."""
        cmdline = [cmd] + list(cmd_args)

# regex filter for matching executable filenames starting with &apostcli-&apos
FILTER="^%s-(.*)$" % __package__
for name, cmd in tcli.utils.find_plugin_executables(FILTER):
    add_exec_plugin(name, cmd)

The add_exec_plugin function adds a little of bit magic, it has an inner
function exec_plugin that represents the command we are adding, dynamically.
The function stays the same every time it is added, only its variable data
changes. Perhaps surprising is that the cmd variable is also addressable inside
the inner function. If you think this sort of thing is interesting, the topics
to read more about are scopes, namespaces and decorators.

With a dynamic search and load of tcli- prefixed executables in place, we
should test if it works as it should. Make a simple wrapper script in your
current directory, and remember to chmod +x it:


ls "$@"

Running the tcli command will now show a new module called ‘ls’ which we can
run, adding the current directory to our PATH for the test:

export PATH=$PATH:.
tcli ls -la --color

Yay, we made ourselves a new way of calling ls. Perhaps time for a break ;-)

An old man and his Docker

As the above mechanism can be used to plug any wrapper as a module we now have
a quick way to hook Docker images as tcli modules. Here’s a simple example
that runs Packer:


docker run --rm -it "hashicorp/packer@sha256:$sha256" "$@"

The last command below should run the entrypoint from hashicorp/packer,
and we’ve reached YALI (Yet Another Layer of Indirection):

export PATH=$PATH:.
tcli builder

Hopefully it is obvious how this can be useful in a team setting. However,
creating bash wrappers for Docker isn’t that great, it would be a better and
faster UX if we could discover what (local?) containers to load as tcli modules
automatically. One idea to consider is an implementation where tcli used data
from Docker labels with Label Schema. The and
org.label-schema.description labels would be of immediate use, representing
the module command name and a single line of descriptive text, suitable for the
top level tcli --help command output. Docker has an easy-to-use Python API
so anyone considering that as a project should be starting from there.

Other plugin ideas

The scope of what we could or should be doing with the team cli idea is
interesting, bring your peers in and discuss! For me however, the fact that it
runs locally, inside our personal dev envs, is a large plus.

Here’s a short list of ideas to consider where I believe a team cli could bring

  • git projects management, submodules replacement, templating

    tcli project list # list your teams git repositories, with descriptions
    tcli project create # templating
    tcli project [build|test|deploy]

    This is potentially very useful for my current team at $WORK. I’m planning to
    research how to potentially do this with a control repo pattern using

  • Secrets management

    While waiting for our local Vault implementation team to drink all of their
    coffee we can try making a consistent interface to (a subset of) the problem?
    Plugging in our current solution (or non-solution) would help, at least.

    If you don’t already have a gpg wrapper I’d look at blackbox.

  • Shared web bookmarks

    tcli web list
    tcli web open dashboard
    tcli web open licensing

    Would potentially save hours of searching in a matter of weeks ;-)

  • On-call management

    E.g as the example tcli-oncall Python plugin we used earlier.

  • Dev environment testing, reporting, management

    While having distributed dev environments is something I’m a big fan of it
    is sometimes hard figuring out just WHAT your coworker is doing. Running
    tests in each team members context to verify settings, versioning and so on
    is very helpful.

    And really there’s no need for every single one of us to have our own,
    non-shared Golang binary unzip update routine.

Wait, what just happened?

We had an idea, explored it, and got something working! At this stage our team
cli can run almost anything and do so with an acceptable UX, a minimum of
consistency and very little code. Going further, we should probably add some
tests, at least to the functions in tcli.utils. Also, an even thinner design
of the core, where discovering executables is a plugin in itself, would be
better. If someone want to help making this a real project and iron out these
wrinkles, please contact me!

You might have noticed I didn’t bring up much around the versus ChatOps
arguments again. Truth is there is not much to discuss, I just wanted to
present this idea as an alternative, and the term ChatOps get people thinking
about the correct problem sets. A fully distributed team would most likely
try harder to avoid centralized services than others. There is quite some power
to be had by designing your main automated pipeline to act just as another
, driving the exact same components and tooling as us non-robots.

In more descriptive, practical terms it could be you notifying your team ‘My
last build at commit# failed this way’
through standardized tooling, as
opposed to the more common model where all build pipeline logic and message
generation happens centrally.

by Christopher Webber ( at December 25, 2016 05:00 AM

December 24, 2016

System Administration Advent Calendar

Day 24 - Migrating from mrepo to reposync

Written by: Kent C. Brodie (


We are a RedHat shop (in my case, many CentOS servers, and some RedHat as well). To support the system updates around all of that I currently use mrepo, an open source repository mirroring tool created by Dag Wieers. Mrepo is an excellent yum repository manager that has the ability to house, manage, and mirror multiple repositories. Sadly for many, mrepo’s days are numbered. Today, I’m going to cover why you may need to move from using mrepo, and how to use reposync in its place.

For me, mrepo has thus far done the job well. It allows you to set up and synchronize multiple repositories all on the same single server. In my case, I have been mirroring RedHat 6/7, and Centos 6/7 and it has always worked great. I’ve had this setup for years, dating back to RedHat 5.

While mirroring CentOS with mrepo is fairly trivial, mirroring RedHat updates requires a little extra magic: mrepo uses a clever “registration” process to register a system to RedHat’s RHN (Red Hat Network) service, so that the fake “registered server” can get updates.

Let’s say you have mrepo and wanted to set up a RedHat 6 repository. The key part of this process uses the “gensystemid” command, something like this:

gensystemid -u RHN_username -p RHN_password --release=6Server --arch=x86_64 /srv/mrepo/src/6Server-x86_64/

This command actually logs into RedHat’s RHN, and “registers” the server with RHN. Now that this fake-server component of mrepo is allowed to access RedHat’s updates, it can begin mirroring the repository. If you log into RedHat’s RHN, you will see a “registered server” that looks something like this:

Redhat RHN registered server screen Redhat RHN registered server screen


So what’s the issue? For you RedHat customers, if you’re still using RHN in any capacity, you hopefully have seen this notice by now:

Redhat RHN warning Redhat RHN warning

Putting this all together: If you’re using mrepo to get updates for RedHat servers, that process is going to totally break in just over 7 months. mrepo’s functionality for RedHat updates depends on RedHat’s RHN, which goes away July 31st.

Finally, while mrepo is still used widely, it is worth noting that it appears continued development of mrepo ceased over four years ago. There have been a scattering of forum posts out there that mention trying to get mrepo to work with RedHat’s new subscription-management facility, but I never found a documented solution that works.


Reposync is a command-line utility that’s included with RedHat-derived systems as part of the yum-utils RPM package. The beauty of reposync is its simplicity. At the core, an execution of reposync will examine all of the repositories that the system you’re running it on has available, and downloads all of the included packages to local disk. Technically, reposync has no configuration. You run it, and then it downloads stuff. mrepo on the other hand, requires a bit of configuration and customization per repository.


You simply have to think about the setup differently. In our old model, we had one server that acted as the master repository for all things, whether it was RedHat 6, CentOS 7, whatever. This one system was “registered” multiple times, to mirror RPMS for multiple operating system variants and versions.

In the new model, we have to divide things up. You will need one dedicated server per operating system version. This is because any given server can only download RPMs specific to the operating system version that server is running. Fortunately with today’s world of hosting virtual machines, this isn’t an awful setup, it’s actually quite elegant. In my case, I needed a dedicated server for each of: RedHat 6, RedHat 7, CentOS 6, and CentOS 7.

For the RedHat servers, the elegant part of this solution deals with the fact that you no longer need to use “fake” system registration tools (aka gensystemid). You simply register each of the repository servers using RedHat’s preferred system registration: the “subscription-manager register” command that RedHat provides (with the retirement of RHN coming, the older rhn_register command is going bye-bye). mrepo, at present, does not really have a way to do this using RedHat’s “new” registration mechanism.


The best way for you to see how reposync works is to try it out. For this example, I highly recommend starting with a fresh new server. Because I want to show the changes that occur with subscribing the server to extra channel(s), I am using RedHat 6. You are welcome to use CentOS but note the directory names created will be different and by default the server will already be subscribed to the ‘extras’ channel.

For my example, perform the following steps to set up a basic reposync environment:
* Install a new server with RedHat6. A “Basic” install is best. * Register the server with RedHat via the subscription-manager command.
* Do NOT yet add this server to any other RedHat channels * Do NOT yet install any extra repositories like EPEL. * Install the following packages via YUM: yum-utils, httpd * Remove /etc/httpd/conf.d/welcome.conf (The repository will not have an index web page, so by removing this, you’re not redirected to a default apache error document) * Ensure the system’s firewall is set so that you can reach this server via a web browser

The simplest form of the reposync command will download all packages from all channels your system is subscribed to, and place them in a directory of your choosing.

The following command will download thousands of packages and build a full local RedHat repository, including updates:

The resulting directory structure will look like this:

/var/www/html -- rhel-6-server-rpms – Packages ```

If you point your web browser to http://repohost/rhel–6-server-rpms/Packages, you should see all of your packages.

Use RedHat’s management portal to add this same server to RedHat’s “Optional packages” channel. For my example, I also installed the EPEL repository to my yum environment.

With the server now ‘subscribed’ to more stuff (RedHat’s optional channel and EPEL), a subsequent reposync command like performed above now generates the following:

|-- epel
|-- rhel-6-server-optional-rpms
|   `-- Packages
`-- rhel-6-server-rpms
    `-- Packages

Note: EPEL is a simpler repo, it does not use a “Packages” subdirectory.

Hopefully this all makes sense now. The reposync command examines what repositories your server belongs to, and downloads them. Reminder: you need one ‘reposync’ server for each major operating system version you have, because each server can only download RPMs and updates specific to the version of the operating system the server is running.


One more step, actually. A repository directory full of RPM files is only the first of two pieces. The second is the metadata. The repository metadata is set up using the “createrepo” command. The output of this will be a “repodata” subdirectory containing critical files YUM requires to sort things out when installing packages.

Using a simple example and our first repository from above, let’s create the metadata:

After which our directory structure now looks like this:

/var/www/html |– epel |– rhel–6-server-optional-rpms | -- Packages – rhel–6-server-rpms -- Packages – repodata ```

You will need to repeat the createrepo command for each of the individual repositories you have. Each time you use reposync, it should be followed by a subsequent execution of createrepo. The final step in all of this to keep current is the addition of cron job entries that usually run reposync and createrepo every night.


Both reposync and createrepo have several command options. Here are some key options that I found useful and explanations as to when or why to use them.



This downloads not only the RPMS, but also extra metadata that may be useful, most important of which is an xml file that contains version information as relates to updates This totally depends on the particular repository you’re syncing.


Also download the comps.xml file. The comps.xml file is critical to deal with “grouping” of packages (example “yum groupinstall Development-tools” will not function unless the repository has that file).


Only download the latest versions of each RPM. This may or may not be useful, depending on whether you only want the absolute newest of everything, or whether you want ALL versions of everything.



If you have a comps.xml file for your repository, you need to tell createrepo exactly where it is.

–workers N

The number of worker processes to use. This is super handy for repositories that have thousands and thousands of packages. It speeds up the createrepo process significantly.


Do an “update” versus a full new repo. This drastically cuts down on the I/O needed to create the final resulting metadata.


The main point of this SysAdvent article was to help those using mrepo today to wrap their head around reposync, and (no thanks to RedHat), why you need to move away from mrepo to something else like reposync if you’re an RHN user. My goal was to provide some simple examples and to provide understanding how it works.

If you do not actually have official RedHat servers (for example, you only have CentOS etc.), you may be able to keep using mrepo for quite some time, despite that the tool has not had any active development in years. Clearly, a large part of mrepo’s functionality will break after 7/31/2017. Regardless of whether you’re using RedHat or CentOS, reposync is in my opinion an excellent and really simple alternative to mrepo. The only downside is you need multiple servers (one for each OS version), but virtualization helps keep that down to a minimal expense.

by Christopher Webber ( at December 24, 2016 05:00 AM

December 23, 2016

System Administration Advent Calendar

Day 23 - That Product Team Really Brought The Room Together

Written by: H. “Waldo” Grunenwald (@gwaldo)
Edited by: Cody Wilbourn (

There are plenty of articles talking about DevOps and Teamwork and Aligning Authority with Responsibility, but what does that look like in practice?

Having been on many different kinds of teams, and having run a Product Team, I will talk about why I think that Product Teams are the best way to create and run products sustainably.

Hey, Didn’t you start with “DevOps Doesn’t Work” last time?

Yes, (yes I did). And I believe every word of it. I consider Product Teams to be a definitive implementation of “Scaling DevOps” which so many people seem to struggle with when the number of people involved scales beyond a conference room.

To my mind, Product Teams are the best way to ensure that responsibility is aligned with authority, ensuring that the applications that you need are operated sustainably, and minimizes the likelihood that a given application becomes “Legacy”.

What do you mean “Legacy”?

There is a term that we use in this industry, but I don’t think that I’ve ever seen it be well-defined. In my mind, a Legacy Product is:

  1. Uncared For: Not under active development. Any releases are rare, using old patterns, and are often the result of a security update breaking functionality, causing a fire-drill of fixing dependencies.
  2. In an Orphanage: The people who are responsible for it don’t feel that they own it, but are stuck with it.

If there is a team that actively manages a legacy product, they might not be really equipped to make significant changes. Most of the time they are tasked only with keeping this product barely running, and may have a portfolio of other products in similar state. This “Legacy Team” might have some connotation associated with it of being “second-string” engineers, and it might be a dumping ground for many apps that aren’t currently in active development.

What are we coming from?

The assumed situation is there is a product or service that is defined by “business needs”.
A decision is come to that these goals are worthwhile, and a Project is defined.
This may be a new product or service, or it may be features to an existing product or service. At some point this Project goes into “Production”, where it is hopefully consumed by users, and hopefully it provides value.

Here’s where things get tricky.

In most companies, the team that writes the product is not the same team that runs the product. This is because many companies organize themselves into departments. Those departments often have technical distinctions like “Development” or “Engineering”, and “Quality Assurance”, and an “Operations” and/or “Systems” groups. In these companies, people are aligned along job function, but each group is responsible for a phase of a product’s lifecycle.

And this is exactly where the heart of the problem is:

The first people who respond to a failure of the application aren’t the application’s developers, creating a business inefficiency:
Your feedback loop is broken.

As a special bonus, some companies organize their development into a so-called “Studio Model”, where a “studio” of developers work on one project. When they are done with that project, it gets handed off to a separate team for operation, and another team will handle “maintenance” development work. That original Studio team may never touch that original codebase again! If you have ever had to maintain or operate someone else’s software, you might well imagine the incentives that this drives, like assumptions that everything is available, and latency is always low!

See, the Studio Model is patterned after Movie and Video Game Studios. This can work well if you are releasing a product that doesn’t have an operational component. Studios make a lot of sense if you’re releasing a film. Some applications like single-player Games, and Mobile Apps that don’t rely on Services are great examples of this.

If your product does have an operational component, this is great for the people on the original Studio team, for whom work is an evergreen pasture. Unfortunately it makes things more painful for everyone who has to deal with the aftermath, including the customers. In reality it’s a really efficient way of turning out Legacy code.

Let’s face it, your CEO doesn’t care that you wrote code real good. They care that the features and products work well, and are available so that they bring in money. They want an investment that pays off.

Having Projects isn’t a problem. But funding teams based on Projects is problematic. You should organize around Products.

Ok, I’ll bite. What’s a Product Team?

Simply put, a Product Team is a team that is organized around a business problem. The Product Team is comprised of people such that it is largely Self-Contained, and collectively the team Owns it’s own Products. It is “long-lived”, as the intention behind it is that the team is left intact as long as the product is in service.

Individuals on the team will have “Specialties”, but “that’s not my job” doesn’t exist. The QA Engineer specializes in determining ways of assuring that software does what’s expected to. They are not responsible for the writing of useful test cases, but they are not limited to the writing of tests. Notably, they’re not solely responsible for the writing of tests. Likewise for Operations Engineers, who have specialties in operating software, infrastructure automation, and monitoring, but they aren’t limited to or solely responsible for those components. Likewise for Software Engineers…

But the Product Team doesn’t only include so-called “members of technical staff”. The Product Team may also need other expertise! Design might be an easy assumption, but perhaps you should have a team member from Marketing, or Payments Receivable, or anyone who has domain expertise in the product!

It’s not a matter of that lofty goal of “Everyone can do everything.” Even on Silo teams, this never works. This is “Everyone knows enough to figure anything out“, and ”Everyone feels enough ownership to be able to make changes."

The people on this team are on THIS team. Having or being an engineer on multiple teams is painful and will cause problems.

You mentioned “Aligning Authority with Responsibility” before…

By having the team be closely-knit, and long-lived, certain understandings need to be had. What I mean is that if you want to have a successful product, and a sustainable lifecycle, there are some understandings that need to take place with regards to the staffing:

  • Engineers have a one-to-one relationship to a Product Team.
  • Products have a one-to-one relationship with a Product Team.
  • A Product Team may have a one-to-many relationship with it’s Products.
  • A Product Team will have a one-to-one relationship with a Pager Rotation.
  • An Engineer will have a one-to-one membership with it’s Pager Rotation.

Simply put, having people split among many different teams sounds great in theory, but it never works out well for the individuals. The teams never seem to get the attention required from the Individual Contributors, and an Individual Contributor is in a position of effectively doubling their number of bosses having to appease them all.


Some developers might balk at being made to participate in the operation of the product that they’re building. This is a natural reaction.
They’ve never had to do that before. Yes, exactly.
That doesn’t mean that they shouldn’t have to. That is the “we’ve always done it this way” argument.

This topic has already been well-covered in another article in this year’s SysAdvent, in Alice Goldfuss’ “No More On-Call Martyrs”, itself well-followed up by @DBSmasher’s “On Being On-Call”.

In this regard, I say is that if one’s sleep is on the line - if you are on the hook for the pager - you will take much more care in your assumptions when building a product, than if that is someone else’s problem.

The last thing that amazes me is that this is a pattern that is well-documented in many of the so-called “Unicorn Companies”, who’s practices many companies seek to emulate, but somehow “Developers-on-Call” always is argued to be “A Bridge Too Far”.

I would argue that this is one of their keystones.

Who’s in Charge

Before I talk about anything else, I have to make one thing perfectly clear. If you have a role in Functional Leadership (Engineering Manager, Operations Director, etc), your role will probably change.

In Product Teams, the Product Owner decides work to be done and priorities.

Within the team you have the skills that you need to create and run it, delegating functions that you don’t possess to other Product Teams. (DBA’s being somewhat rare, and “DB-as-a-Service” is somewhat common.)

Many Engineering and Operations managers were promoted because they were good at Engineering or Ops. Unfortunately it’s then that it sets in that, in Lindsay Holmwood’s words, “It’s not a promotion, it’s a career change”, and also addressed in this year’s SysAdvent article “Trained Engineers - Overnight Managers (or ‘The Art of Not Destroying Your Company’)” by Nir Cohen.

How many of you miss Engineering, but spend all of your time doing… stuff?

Under an org that leverages Product Teams, Functional Leaders have a fundamentally different role than they did before.

Leadership Roles

Under Product Team paradigm, Product Managers are responsible for the work, while Functional Managers are responsible for passing of knowledge, and overseeing the career growth of Individual Contributors.

Product Managers Functional Managers
Owns Product IC’s Professional Development
Product Direction Coordinate Knowledge
Assign Work & Priority Keeper of Culture
Hire & Fire from Team Involved in Community
Decide Team Standards Bullshit Detector / Voice of Reason

Product Managers

The Product Manager “Owns the Product”. They are ultimately responsible for the product successfully meeting business needs. Everything else is in support of that. I must stress that it isn’t necessary that a Product Manager be technical, though it does seem to help.

The product owner is the person who understands the business goals that knowledge and those stakes, they assign work and priorities such that it’s aligned with those business goals.

Knowing the specific problems that they’re solving and the makeup of their team, they are responsible for hiring and firing from the team.

Because the Product Team is responsible for their own success, and availability (by which I mean, of course, the Pager), they get to make decisions locally. They get to decide themselves what technologies they want to use and suffer.

Finally, the Product Manager evangalizes their product for other teams to leverage, and helps to on-board them as customers.

Functional Managers

At this point, I expect that the Functional managers are wondering “well what do I do?” Functional Managers aren’t dictating what work is done anymore, but there is still a lot of value that they bring. Their job becomes The People.

I don’t know a single functional manager who has been able to attend to their people’s professional development like they feel that they should.

Since technology decisions are made within the Product Team, the Functional Management has a key role in coordinating knowledge between the members of their Community, keeping track of who’s-using-what, and the relevant successes and pitfalls. When one team is considering a new tool that another is using, or a team is struggling with a tech, the functional manager is well-equipped for connecting people.

Functional Managers are the Keepers of Culture, and are encouraged to be involved in Community. That community-building is both within the company and in their physical region.

Functional managers are crucial for Hiring into the company, and helping Product Managers with hiring skills that they aren’t strong with. For instance, I would run a developer candidate by a development manager for a sanity-check, but for a DBA, I’d be very reliant on a DBA Manager’s expertise and opinion!

Relatedly, the Functional Manager serves as a combination Bullshit Detector and Voice-of-Reason when there are misunderstandings between the Product Owners and their Engineers.

The Reason for Broad Standards

Broad standards are often argued for one of two main reasons: either for “hiring purposes”, where engineers may be swapped relatively interchangably, or because there is a single Ops team responsible for many products, who doesn’t have ability to cope with multiple ways of doing things. (Since any one Engineer might be called upon to solve many apps in the dark of the night.)

Unfortunately, app development can often be hampered by those Standards that don’t fit their case and needs.

Hahahaha I’m kidding! What really happens is that Dev teams clam up about what they’re doing. They subvert the “standards” and don’t tell anyone, either pleading ignorance or claiming that they can’t go back and rewrite because of a deadline. Best case is that they run a request for an “exemption” up the flagpole, where Ops gets Over-riden. And Operations is still left with a “standard” and pile of “one-offs”.

Duplicate Effort

Another claimed reason for broad “Standards” is to “reduce the amount of duplicated effort”. While this is a great goal, again, it tends to cause more friction than is necessary.

The problem is the fallacy that comes from assuming that the way that a problem was solved for one team will be helpful to another. That solution may be helpful, but to assume that it will, and making it mandatory is going to cause unnecessary effort.

At one company, my team ran ELK as a product for other teams to consume. A new team was spun up, and asked about our offerings, but asked my opinion of them using a different service (an externally-hosted ELK-as-a-Service). I was thrilled, in fact! I want to see if we were solving the problem in the best way, or even a good way, and to be able to come back later for some lessons-learned!

Scaling Teams

At some point, your product is going to get bigger than everyone can keep in their head. It may be time to split up responsibilities into a new team. But where to draw boundaries? Interrogate them!

A trick that I learned a long time ago for testing your design in Object-Oriented Programming is to ask the object a question: “What are you?” or “What do you do?” If the answer includes an “And”, you have two things. This works well for evaluating both Class and Method design. (I think that this tidbit was from Sandi Metz’s “Practical Object-Oriented Design in Ruby” (aka “POODR”), which I was exposed to by Mark Menard of Enable Labs.)

What Doesn’t Work

Because this can be a change to how teams work, it’s important to be clear about the rules. If there is a misunderstanding about where work comes from, or who the individual contributors work for, or who decides the people who belong to what team, this begins to fall apart.

Having people work for multiple sets of managers is untenable.

Having people quit is an unavoidable problem in any company. Having a functional manager decide by themselves that they’re going to reassign one of your people away from you is worse, because they’re not playing by the rules.

WARNING: Matrix Organizations Considered Harmful

If someone proposes a Matrix Org, you need to be extremely careful. It’s important that you keep a separation of Church and State. Matrix Organizations instantly create a conflict between the different axes of managers, with the tension being centered on the individual contributor who just wants to do good work. A Matrix Org actively adds politics.

All Work comes from Product Management. Functional Management is for Individual Careers and Sharing Knowledge.

This shouldn’t be hard to remember, as the Functional Leaders shouldn’t have work to assign. But it will be hard, because they’ll probably have a lot of muscle-memory around prioritizing and assigning work.

Now, I’m sure a lot of you are skeptical about how a product team actually works. You might just not believe me.

If you properly staff a team, give them direction, authority, and responsibility, they will amaze you.

Getting Started

As with anything, the hardest thing to do is begin.

Identifying Products

An easy candidate is a new intiative for development that may be coming down the pipeline, but if you aren’t aware of any new products, you probably have many “orphaned” products already running within your environment.

As I discussed last year, there are plenty of ways of finding products that are critical, but not actually maintained by anyone. Common places to look are tools around development, like CI, SCM, and Wikis. Also commonly neglected are what I like to call “Insight Tools” like Logging, Metrics, and Monitoring/Alerting. These all tend to be installed and treated as appliances, not receiving any maintenance or attention unless something breaks. Sadly, it means that there’s a lot of value left on the table with these products!

Speaking with Leadership

If you say “I want to start doing Product Team”, they’re going to think of something along the lines of BizDev. A subtle but important difference is to say that you want to organize a cross-functional team, that is dedicated to the creation and long-term operation of the Product.

I don’t know why, but it seems that executive go gooey when they hear the phrase “cross-functional team”. So, go buzz-word away. While you’re at it, try to initiate some Thought Leadership and coin a term with them like “Product-Oriented Development”! (No, of course it doesn’t mean anything…)

What you’re looking for is a commitment to fund the product long-term. The idea is that your team will solve problems centered around a set of problems. The team is of “Your People”, that becomes a “we”. Oddly enough, when you have a team focused and aligned together, you have really built a capital-T “Team”.


The Product Team should be intact and in-development as long as the product is found to be necessary. When the product is retired, they product team may be disbanded, but nobody should be left with the check. Over time, the features should stabilize, and the bugs will disappear, and the operation of the application should stabilize to a low level of effort, even including external updates.

That doesn’t mean that your engineers need to be farmed out to other teams; you should take on new work, and begin development of new products that aid in your space!


I believe that organizing work in Product Teams is one of the best ways to run a responsible engineering organization. By orienting your work around the Product, you are aligning your people to business needs, and the individuals will have a better understanding of the value of their work. By keeping the team size small, they know how the parts work and fit. By everyone operating the product, they feel a sense of ownership, and by being responsible for the product’s availability, they’re much more likely to build resilient and fault-tolerant applications!

It is for these reasons and more, that I consider Product Teams to be the definitive DevOps implementation.


I’d like to thank my friends for listening to me rant, and my editor Cody Wilbourn for their help bringing this article together. I’d also like to thank the SysAdvent team for putting in the effort that keeps this fun tradition going.

Contact Me

If you wish to discuss with me further, please feel free to reach out to me. I am gwaldo on Twitter and Gmail/Hangouts and Steam, and seldom refuse hugs (or offers of beverage and company) at conferences. Death Threats and unpleasantness beyond the realm of constructive Criticism may be sent to:

c/o FBI Headquarters  
935 Pennsylvania Avenue, NW  
Washington, D.C.  

by Christopher Webber ( at December 23, 2016 05:00 AM

December 22, 2016

Evaggelos Balaskas

Elasticsearch, Logstash, Kibana or ELK Crash Course 101

Elasticsearch, Logstash, Kibana or ELK Crash Course 101

Prologue aka Disclaimer

This blog post is the outcome of a Hackerspace Event:: Logstash Intro Course that happened a few days ago. I prefer doing workshops Vs presentations -as I pray to the Live-Coding Gods- and this is the actual workshop in bulletin notes.


For our technical goal we will use my fail2ban !
We will figure (together) whom I ban with my fail2ban!!!

The results we want to present are:

Date IP Country

To help you with this inquiry, we will use this dataset: fail2ban.gz

If you read though this log you will see that it’s a grep from my messages logs.
So in the begging we have messages from compressed files … and in the end we have messages from uncompressed files.

But … Let’s begin with our journey !!


For our little experiment we need Java

I Know, I know … not the beverage - the programming language !!

try java 1.7.x

# java -version
java version "1.7.0_111"
OpenJDK Runtime Environment (IcedTea 2.6.7) (Arch Linux build 7.u111_2.6.7-1-x86_64)
OpenJDK 64-Bit Server VM (build 24.111-b01, mixed mode)

In my archlinux machine:

# yes | pacman -S jdk7-openjdk


As, October 26, 2016 all versions (logstash,elastic,kibana) are all in version 5.0.x and latests.
But we will try the well-known installed previous versions !!!

as from 5.0.x and later …. we have: Breaking changes and you will need Java 8


Let’s download software

# wget -c

# wget -c

# wget -c


Uncompress and test that logstash can run without a problem:

# unzip
# cd logstash-2.4.1

# logstash-2.4.1/
# ./bin/logstash --version
logstash 2.4.1

# ./bin/logstash --help

Basic Logstash Example

Reminder: Ctrl+c breaks the logstash

# ./bin/logstash -e 'input { stdin { } } output { stdout {} }'

We are now ready to type ‘Whatever’ and see what happens:

# ./bin/logstash -e 'input { stdin { } } output { stdout {} }'
Settings: Default pipeline workers: 4
Pipeline main started


2016-11-15T19:18:09.638Z myhomepc whatever

Ctrl + c
Ctrl + c

^CSIGINT received. Shutting down the agent. {:level=>:warn}
stopping pipeline {:id=>"main"}
Received shutdown signal, but pipeline is still waiting for in-flight events
to be processed. Sending another ^C will force quit Logstash, but this may cause
data loss. {:level=>:warn}
^CSIGINT received. Terminating immediately.. {:level=>:fatal}

Standard Input and Standard Output

In this first example the input is our standard input, that means keyboard
and standard output means our display.

We typed:


and logstash reports:

2016-11-15T19:18:09.638Z myhomepc whatever

There are three (3) fields:

  1. timestamp : 2016-11-15T19:18:09.638Z
  2. hostname : myhomepc
  3. message : whatever

Logstash Architecture


Logstash architecture reminds me Von Neumann .

Input --> Process --> Output 

In Process we have filter plugins and in input pluggins & output plugins we have codec plugins

Codec plugins

We can define the data representation (logs or events) via codec plugins. Most basic codec plugin is: rubydebug


eg. logstash -e ‘input { stdin { } } output { stdout { codec => rubydebug} }’

# ./bin/logstash -e 'input { stdin { } } output { stdout { codec => rubydebug} }'
Settings: Default pipeline workers: 4
Pipeline main started


       "message" => "whatever",
      "@version" => "1",
    "@timestamp" => "2016-11-15T19:40:46.070Z",
          "host" => "myhomepc"

^CSIGINT received. Shutting down the agent. {:level=>:warn}
stopping pipeline {:id=>"main"}
^CSIGINT received. Terminating immediately.. {:level=>:fatal}


Let’s try the json codec plugin, but now we will try it via a linux pipe:

# echo whatever | ./bin/logstash -e 'input { stdin { } } output { stdout { codec => json }  }' 

Settings: Default pipeline workers: 4
Pipeline main started


Pipeline main has been shutdown
stopping pipeline {:id=>"main"}


# echo -e 'whatever1nwhatever2nn' | ./bin/logstash -e 'input { stdin { } } output { stdout { codec => json_lines }  }'

Settings: Default pipeline workers: 4
Pipeline main started


Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

List of codec

Here is the basic list of codec:


Configuration File

It is now very efficient to run everything from the command line, so we will try to move to a configuration file:


input {
    stdin { }

output {
    stdout {
        codec => rubydebug

and run the above example once more:

# echo -e 'whatever1nwhatever2' | ./bin/logstash -f logstash.conf 

Settings: Default pipeline workers: 4
Pipeline main started

       "message" => "whatever1",
      "@version" => "1",
    "@timestamp" => "2016-11-15T19:59:51.146Z",
          "host" => "myhomepc"
       "message" => "whatever2",
      "@version" => "1",
    "@timestamp" => "2016-11-15T19:59:51.295Z",
          "host" => "myhomepc"

Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

Config Test

Every time you need to test your configuration file for syntax check:

./bin/logstash --configtest

Configuration OK

fail2ban - logstash 1st try

Now it’s time to test our fail2ban file against our logstash setup. To avoid the terror of 22k lines, we will test the first 10 lines to see how it works:

# head ../fail2ban | ./bin/logstash -f logstash.conf

Settings: Default pipeline workers: 4
Pipeline main started

       "message" => "messages-20160918.gz:Sep 11 09:13:13 myhostname fail2ban.actions[1510]: NOTICE [apache-badbots] Unban",
      "@version" => "1",
    "@timestamp" => "2016-11-15T20:10:40.784Z",
          "host" => "myhomepc"
       "message" => "messages-20160918.gz:Sep 11 09:51:08 myhostname fail2ban.actions[1510]: NOTICE [apache-badbots] Unban",
      "@version" => "1",
    "@timestamp" => "2016-11-15T20:10:40.966Z",
          "host" => "myhomepc"
       "message" => "messages-20160918.gz:Sep 11 11:51:24 myhostname fail2ban.filter[1510]: INFO [apache-badbots] Found",
      "@version" => "1",
    "@timestamp" => "2016-11-15T20:10:40.967Z",
          "host" => "myhomepc"
       "message" => "messages-20160918.gz:Sep 11 11:51:24 myhostname fail2ban.actions[1510]: NOTICE [apache-badbots] Ban",
      "@version" => "1",
    "@timestamp" => "2016-11-15T20:10:40.968Z",
          "host" => "myhomepc"
       "message" => "messages-20160918.gz:Sep 11 14:58:35 myhostname fail2ban.filter[1510]: INFO [postfix-sasl] Found",
      "@version" => "1",
    "@timestamp" => "2016-11-15T20:10:40.968Z",
          "host" => "myhomepc"
       "message" => "messages-20160918.gz:Sep 11 14:58:36 myhostname fail2ban.actions[1510]: NOTICE [postfix-sasl] Ban",
      "@version" => "1",
    "@timestamp" => "2016-11-15T20:10:40.969Z",
          "host" => "myhomepc"
       "message" => "messages-20160918.gz:Sep 11 15:03:08 myhostname fail2ban.filter[1510]: INFO [apache-fakegooglebot] Ignore by command",
      "@version" => "1",
    "@timestamp" => "2016-11-15T20:10:40.970Z",
          "host" => "myhomepc"
       "message" => "messages-20160918.gz:Sep 11 15:03:08 myhostname fail2ban.filter[1510]: INFO [apache-fakegooglebot] Ignore by command",
      "@version" => "1",
    "@timestamp" => "2016-11-15T20:10:40.970Z",
          "host" => "myhomepc"
       "message" => "messages-20160918.gz:Sep 11 15:26:04 myhostname fail2ban.filter[1510]: INFO [apache-fakegooglebot] Ignore by command",
      "@version" => "1",
    "@timestamp" => "2016-11-15T20:10:40.971Z",
          "host" => "myhomepc"
       "message" => "messages-20160918.gz:Sep 11 17:01:02 myhostname fail2ban.filter[1510]: INFO [apache-badbots] Found",
      "@version" => "1",
    "@timestamp" => "2016-11-15T20:10:40.971Z",
          "host" => "myhomepc"

Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

fail2ban - filter

As we said in the begging of our journey, we want to check what IPs I Ban with fail2ban !!
So we need to filter the messages. Reading through our dataset, we will soon find out that we need lines like:

"messages-20160918.gz:Sep 11 11:51:24 myhostname fail2ban.actions[1510]: NOTICE [apache-badbots] Ban"

so we could use an if-statement (conditional statements).

fail2ban - Conditionals

You can use the following comparison operators:

    equality: ==, !=, <, >, <=, >=
    regexp: =~, !~ (checks a pattern on the right against a string value on the left)
    inclusion: in, not in 

The supported boolean operators are:

    and, or, nand, xor 

The supported unary operators are:


Expressions can be long and complex.

fail2ban - message filter

With the above knowledge, our logstash configuration file can now be:


input {
    stdin { }

filter {
    if [message]  !~ ' Ban ' {
        drop { }

output {
    stdout {
        codec => rubydebug

and the results:

# head ../fail2ban | ./bin/logstash -f logstash.conf -v

       "message" => "messages-20160918.gz:Sep 11 11:51:24 myhostname fail2ban.actions[1510]: NOTICE [apache-badbots] Ban",
      "@version" => "1",
    "@timestamp" => "2016-11-15T20:33:39.858Z",
          "host" => "myhomepc"
       "message" => "messages-20160918.gz:Sep 11 14:58:36 myhostname fail2ban.actions[1510]: NOTICE [postfix-sasl] Ban",
      "@version" => "1",
    "@timestamp" => "2016-11-15T20:33:39.859Z",
          "host" => "myhomepc"

but we are pretty far away from our goal.

The above approach is just fine for our example, but it is far away from perfect or even elegant !
And here is way: the regular expression ‘ Ban ‘ is just that, a regular expression.

The most elegant approach is to match the entire message and drop everything else. Then we could be most certain sure about the output of the logs.


And here comes grok !!!

and to do that we must learn the grok:

Parses unstructured event data into fields

that would be extremely useful. Remember, we have a goal!
We dont need everything, we need the date, ip & country !!

Grok Patterns

grok work with patterns, that follows the below generic rule:


You can use the online grok debugger: grok heroku
to test your messages/logs/events against grok patterns

If you click on the left grok-patterns you will see the most common grok patterns.

In our setup:

# find . -type d -name patterns

the latest directory is where our logstansh instance keeps the default grok patterns.

To avoid the suspense … here is the full grok pattern:


grok - match

If you run this new setup, we will see something peculiar:


input {
    stdin { }

filter {

#    if [message]  !~ ' Ban ' {
#        drop { }
#    }

    grok {
        match => {
            "message" => "messages%{DATA}:%{SYSLOGTIMESTAMP} %{HOSTNAME} %{SYSLOGPROG}: %{LOGLEVEL} [%{PROG}] Ban %{IPV4}"

output {
    stdout {
        codec => rubydebug

We will get messages like these:

       "message" => "messages:Nov 15 17:49:09 myhostname fail2ban.actions[1585]: NOTICE [apache-fakegooglebot] Ban",
      "@version" => "1",
    "@timestamp" => "2016-11-15T21:30:29.345Z",
          "host" => "myhomepc",
       "program" => "fail2ban.actions",
           "pid" => "1585"
       "message" => "messages:Nov 15 17:49:31 myhostname fail2ban.action[1585]: ERROR /etc/fail2ban/filter.d/ignorecommands/apache-fakegooglebot -- stdout: ''",
      "@version" => "1",
    "@timestamp" => "2016-11-15T21:30:29.346Z",
          "host" => "myhomepc",
          "tags" => [
        [0] "_grokparsefailure"

It match some of them and the all the rest are tagged with grokparsefailure

We can remove them easily:


input {
    stdin { }

filter {

#    if [message]  !~ ' Ban ' {
#        drop { }
#    }

    grok {
        match => {
            "message" => "messages%{DATA}:%{SYSLOGTIMESTAMP} %{HOSTNAME} %{SYSLOGPROG}: %{LOGLEVEL} [%{PROG}] Ban %{IPV4}"

    if "_grokparsefailure" in [tags] {
        drop { }

output {
    stdout {
        codec => rubydebug

Using colon (:) character on SYNTAX grok pattern is a new field for grok / logstash.
So we can change a little bit the above grok pattern to this:


but then again, we want to filter some fields, like the date and IP, so

messages%{DATA}:%{SYSLOGTIMESTAMP:date} %{HOSTNAME} %{PROG}(?:[%{POSINT}])?: %{LOGLEVEL} [%{PROG}] Ban %{IPV4:ip}


input {
    stdin { }

filter {

#    if [message]  !~ ' Ban ' {
#        drop { }
#    }

    grok {
        match => {
            "message" => "messages%{DATA}:%{SYSLOGTIMESTAMP:date} %{HOSTNAME} %{PROG}(?:[%{POSINT}])?: %{LOGLEVEL} [%{PROG}] Ban %{IPV4:ip}"

    if "_grokparsefailure" in [tags] {
        drop { }

output {
    stdout {
        codec => rubydebug

output will be like this:

       "message" => "messages:Nov 15 17:49:32 myhostname fail2ban.actions[1585]: NOTICE [apache-fakegooglebot] Ban",
      "@version" => "1",
    "@timestamp" => "2016-11-15T21:42:21.260Z",
          "host" => "myhomepc",
          "date" => "Nov 15 17:49:32",
            "ip" => ""

grok - custom pattern

If we want to match something specific with to a custom grok pattern, we can simple add one!

For example, we want to match Ban and Unban action:

# vim ./vendor/bundle/jruby/1.9/gems/logstash-patterns-core-2.0.5/patterns/ebal
ACTION (Ban|Unban)

and then our grok matching line will transform to :


input {
    stdin { }

filter {

#    if [message]  !~ ' Ban ' {
#        drop { }
#    }

    grok {
        match => {
#            "message" => "messages%{DATA}:%{SYSLOGTIMESTAMP:date} %{HOSTNAME} %{PROG}(?:[%{POSINT}])?: %{LOGLEVEL} [%{PROG}] Ban %{IPV4:ip}"
            "message" => "messages%{DATA}:%{SYSLOGTIMESTAMP:date} %{HOSTNAME} %{PROG}(?:[%{POSINT}])?: %{LOGLEVEL} [%{PROG}] %{ACTION:action} %{IPV4:ip}"

    if "_grokparsefailure" in [tags] {
        drop { }

output {
    stdout {
        codec => rubydebug


       "message" => "messages:Nov 15 18:13:53 myhostname fail2ban.actions[1585]: NOTICE [apache-badbots] Unban",
      "@version" => "1",
    "@timestamp" => "2016-11-15T21:53:59.634Z",
          "host" => "myhomepc",
          "date" => "Nov 15 18:13:53",
        "action" => "Unban",
            "ip" => ""


We are getting pretty close … the most difficult part is over (grok patterns).
Just need to remove any exta fields. We can actually do that with two ways:

  1. grok - remove_field
  2. mutate -remove_field

We’ll try mutate cause is more powerful.

And for our example/goal we will not use any custom extra Action field, so:


input {
    stdin { }

filter {

#    if [message]  !~ ' Ban ' {
#        drop { }
#    }

    grok {
        match => {
            "message" => "messages%{DATA}:%{SYSLOGTIMESTAMP:date} %{HOSTNAME} %{PROG}(?:[%{POSINT}])?: %{LOGLEVEL} [%{PROG}] Ban %{IPV4:ip}"
#            "message" => "messages%{DATA}:%{SYSLOGTIMESTAMP:date} %{HOSTNAME} %{PROG}(?:[%{POSINT}])?: %{LOGLEVEL} [%{PROG}] %{ACTION:action} %{IPV4:ip}"

    if "_grokparsefailure" in [tags] {
        drop { }
    mutate {
        remove_field => [ "message", "@version", "@timestamp", "host" ]

output {
    stdout {
        codec => rubydebug


    "date" => "Nov 15 17:49:32",
      "ip" => ""

so close !!!

mutate - replace

According to syslog RFC (request for comments) [RFC 3164 - RFC 3195]:

 In particular, the timestamp has a year, making it a nonstandard format

so most of logs doesnt have a YEAR on their timestamp !!!

Logstash can add an extra field or replace an existing field :


input {
    stdin { }

filter {

#    if [message]  !~ ' Ban ' {
#        drop { }
#    }

    grok {
        match => {
            "message" => "messages%{DATA}:%{SYSLOGTIMESTAMP:date} %{HOSTNAME} %{PROG}(?:[%{POSINT}])?: %{LOGLEVEL} [%{PROG}] Ban %{IPV4:ip}"
#            "message" => "messages%{DATA}:%{SYSLOGTIMESTAMP:date} %{HOSTNAME} %{PROG}(?:[%{POSINT}])?: %{LOGLEVEL} [%{PROG}] %{ACTION:action} %{IPV4:ip}"

    if "_grokparsefailure" in [tags] {
        drop { }
    mutate {
        remove_field => [ "message", "@version", "@timestamp", "host" ]
        replace => { date => "%{+YYYY} %{date}" }

output {
    stdout {
        codec => rubydebug

the output:

    "date" => "2016 Nov 15 17:49:32",
      "ip" => ""


The only thing that is missing from our original goal, is the country field!

Logstash has a geoip plugin that works perfectly with MaxMind

So we need to download the GeoIP database:

# wget -N

The best place is to put this file (uncompressed) under your logstash directory.

Now, it’s time to add the geoip support to the logstash.conf :

  # Add Country Name
  # wget -N
  geoip {
    source => "ip"
    target => "geoip"
    fields => ["country_name"]
    database => "GeoIP.dat"
   # database => "/etc/logstash/GeoIP.dat"

the above goes under the filter section of logstash conf file.

running the above configuration

# head ../fail2ban | ./bin/logstash -f logstash.conf

should display something like this:

     "date" => "2016 Sep 11 11:51:24",
       "ip" => "",
    "geoip" => {
        "country_name" => "Netherlands"
     "date" => "2016 Sep 11 14:58:36",
       "ip" => "",
    "geoip" => {
        "country_name" => "Russian Federation"

We are now pretty close to our primary objective.


It would be nice to somehow translate the geoip –> country_name to something more useful, like Country.

That’s why we are going to use the rename setting under the mutate plugin:

  mutate {
    rename => { "[geoip][country_name]"  => "Country" }

so let’s put all them together:

    geoip {
        source => "ip"
        target => "geoip"
        fields => ["country_name"]
        database => "GeoIP.dat"

    mutate {
        rename => { "[geoip][country_name]"  => "Country" }
        remove_field => [ "message", "@version", "@timestamp", "host", "geoip"]
        replace => { date => "%{+YYYY} %{date}" }

test run it and the output will show you something like that:

       "date" => "2016 Sep 11 11:51:24",
         "ip" => "",
    "Country" => "Netherlands"
       "date" => "2016 Sep 11 14:58:36",
         "ip" => "",
    "Country" => "Russian Federation"

hurray !!!

finally we have completed our primary objective.

Input - Output

Input File

Until now, you have been reading from the standard input, but it’s time to read from the file.
To do so, we must add the bellow settings under the input section:

file {
    path => "/var/log/messages"
    start_position => "beginning"

Testing our configuration file (without giving input from the command line):

./bin/logstash -f logstash.conf

and the output will be something like this:

       "path" => "/var/log/messages",
       "date" => "2016 Nov 15 17:49:09",
         "ip" => "",
    "Country" => "United States"
       "path" => "/var/log/messages",
       "date" => "2016 Nov 15 17:49:32",
         "ip" => "",
    "Country" => "United States"

so by changing the input from the standard input to a file path, we added a new extra filed.
The path

Just remove it with mutate –> remove_field as we already shown above


Now it’s time to send everything to our elastic search engine:

output {

    # stdout {
    #    codec => rubydebug
    # }

    elasticsearch {


Be Careful: In our above examples we have removed the timestamp field
but for the elasticsearch to work, we must enable it again:

remove_field => [ "message", "@version", "host", "geoip"]


Uncompress and run elastic search engine:

# unzip
# cd elasticsearch-2.4.1/
# ./bin/elasticsearch

elasticsearch is running under:

tcp6       0      0          :::*                    LISTEN      27862/java
tcp6       0      0          :::*                    LISTEN      27862/java

Impressive, but that’s it!


Let’s find out if the elasticsearch engine is running:

$ curl 'localhost:9200/_cat/health?v'

$ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'

epoch      timestamp cluster       status shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1482421814 17:50:14  elasticsearch yellow          1         1      1   1    0    0        1             0                  -                 50.0% 

$ curl 'localhost:9200/_cat/nodes?v'

host      ip        heap.percent ram.percent load node.role master name            7          98 0.50 d         *      Hazmat 

# curl -s -XGET 'http://localhost:9200/_cluster/health?level=indices' | jq .


Now it’s time to send our data to our elastic search engine, running the logstash daemon with input the fail2ban file and output the elastic search.


We are almost done. There is only one more step to our 101 course for ELK infrastructure.

And that is the kibana dashboard.

setup kibana

Uncompress and run the kibana dashboard:

 tar xf kibana-4.6.3-linux-x86_64.tar.gz


Now simply, open the kibana dashboard on:





December 22, 2016 10:42 PM

That grumpy BSD guy

So somebody is throwing HTML at your sshd. What to do?

Yes, it's exactly as wrong as it sounds. Here's a distraction with bizarre twists for the true log file junkies among you. Happy reading for the holidays!

As will probably not surprise any of my regular readers, I've spent a bit of time recently browsing and processing SSH authentication logs for some of the systems in my care. As usual I browse logs with a view to extracting insights and hopefully at some future date I will be able to present useful material based on analyses of that material.

But sometimes something stands out as just too wrong. Today while browsing archived logs I came across this entry from July:

Jul 8 12:53:17 skapet sshd[88344]: Invalid user <!DOCTYPE from port 57999

That string is the start of an SGML-style document declaration, basically what you would expect to find at the very start of an SGML-ish file such as an HTML document.

Instruct your browser to 'Display source' for this article, and that exact string is the first thing in the HTML source display buffer. But in the context of an authentication log for an SSH service, it's distinctly odd.

And what's more a little later in the same log I found:

Jul  8 20:59:08 skapet sshd[11083]: Invalid user content="text/html; from port 26240

Again, somebody throwing HTML at my sshd, but this time from a different IP address.

This piqued my interest enough that I decided to take a look at whatever else those jokers had been up to:

[Thu Dec 22 19:40:06] peter@skapet:~$ zgrep /var/log/authlog.23.gz
Jul 8 12:53:17 skapet sshd[88344]: Invalid user <!DOCTYPE from port 57999
Jul 8 12:53:17 skapet sshd[88344]: Failed password for invalid user <!DOCTYPE from port 57999 ssh2
Jul 8 12:53:17 skapet sshd[88344]: Connection closed by port 57999 [preauth]
Jul 8 13:02:15 skapet sshd[85203]: Invalid user PUBLIC from port 58123
Jul 8 13:02:15 skapet sshd[85203]: Failed password for invalid user PUBLIC from port 58123 ssh2
Jul 8 13:02:15 skapet sshd[85203]: Connection closed by port 58123 [preauth]
Jul 8 13:11:13 skapet sshd[25261]: Invalid user XHTML from port 57227
Jul 8 13:11:13 skapet sshd[25261]: Failed password for invalid user XHTML from port 57227 ssh2
Jul 8 13:11:13 skapet sshd[25261]: Connection closed by port 57227 [preauth]
Jul 8 13:20:10 skapet sshd[68619]: Invalid user Strict//EN" from port 25941
Jul 8 13:20:10 skapet sshd[68619]: Failed password for invalid user Strict//EN" from port 25941 ssh2
Jul 8 13:20:10 skapet sshd[68619]: Connection closed by port 25941 [preauth]
Jul 8 13:28:58 skapet sshd[96899]: Invalid user <html from port 48462
Jul 8 13:28:58 skapet sshd[96899]: Failed password for invalid user <html from port 48462 ssh2
Jul 8 13:28:58 skapet sshd[96899]: Connection closed by port 48462 [preauth]
Jul 8 13:37:48 skapet sshd[59363]: Invalid user <meta from port 46496
Jul 8 13:37:48 skapet sshd[59363]: Failed password for invalid user <meta from port 46496 ssh2
Jul 8 13:37:48 skapet sshd[59363]: Connection closed by port 46496 [preauth]
Jul 8 13:46:43 skapet sshd[81970]: Invalid user content="text/html; from port 29652
Jul 8 13:46:43 skapet sshd[81970]: Failed password for invalid user content="text/html; from port 29652 ssh2
Jul 8 13:46:43 skapet sshd[81970]: Connection closed by port 29652 [preauth]
Jul 8 13:55:37 skapet sshd[39952]: Invalid user <title>403 from port 45706
Jul 8 13:55:37 skapet sshd[39952]: Failed password for invalid user <title>403 from port 45706 ssh2
Jul 8 13:55:37 skapet sshd[39952]: Connection closed by port 45706 [preauth]
Jul 8 14:04:33 skapet sshd[68947]: Invalid user Forbidden from port 8465
Jul 8 14:04:33 skapet sshd[68947]: Failed password for invalid user Forbidden from port 8465 ssh2
Jul 8 14:04:34 skapet sshd[68947]: Connection closed by port 8465 [preauth]
Jul 8 14:13:29 skapet sshd[42324]: Invalid user is from port 54112
Jul 8 14:13:29 skapet sshd[42324]: Failed password for invalid user is from port 54112 ssh2
Jul 8 14:13:29 skapet sshd[42324]: Connection closed by port 54112 [preauth]
Jul 8 14:22:20 skapet sshd[83537]: Invalid user <style from port 41269
Jul 8 14:22:20 skapet sshd[83537]: Failed password for invalid user <style from port 41269 ssh2
Jul 8 14:22:21 skapet sshd[83537]: Connection closed by port 41269 [preauth]
Jul 8 14:31:06 skapet sshd[53939]: Invalid user body{margin from port 10587
Jul 8 14:31:06 skapet sshd[53939]: Failed password for invalid user body{margin from port 10587 ssh2
Jul 8 14:31:07 skapet sshd[53939]: Connection closed by port 10587 [preauth]
Jul 8 14:40:08 skapet sshd[24320]: Connection closed by port 58537 [preauth]
Jul 8 14:48:57 skapet sshd[97150]: Invalid user fieldset{padding from port 11375
Jul 8 14:48:57 skapet sshd[97150]: Failed password for invalid user fieldset{padding from port 11375 ssh2
Jul 8 14:48:58 skapet sshd[97150]: Connection closed by port 11375 [preauth]
Jul 8 14:57:55 skapet sshd[38951]: Invalid user 10px from port 43776
Jul 8 14:57:55 skapet sshd[38951]: Failed password for invalid user 10px from port 43776 ssh2
Jul 8 14:57:55 skapet sshd[38951]: Connection closed by port 43776 [preauth]
Jul 8 15:07:53 skapet sshd[72492]: Invalid user \^M from port 58382
Jul 8 15:07:53 skapet sshd[72492]: Failed password for invalid user \^M from port 58382 ssh2
Jul 8 15:07:53 skapet sshd[72492]: Failed password for invalid user \^M from port 58382 ssh2
Jul 8 15:07:54 skapet sshd[72492]: Connection closed by port 58382 [preauth]
Jul 8 15:17:05 skapet sshd[68616]: Invalid user 0 from port 3795
Jul 8 15:17:05 skapet sshd[68616]: Failed password for invalid user 0 from port 3795 ssh2
Jul 8 15:17:05 skapet sshd[68616]: Connection closed by port 3795 [preauth]
Jul 8 15:26:09 skapet sshd[14795]: Connection closed by port 59139 [preauth]
Jul 8 15:35:04 skapet sshd[8499]: Invalid user #header{width from port 16030
Jul 8 15:35:04 skapet sshd[8499]: Failed password for invalid user #header{width from port 16030 ssh2
Jul 8 15:44:12 skapet sshd[17233]: Invalid user 2% from port 2551
Jul 8 15:44:12 skapet sshd[17233]: Failed password for invalid user 2% from port 2551 ssh2
Jul 8 15:44:13 skapet sshd[17233]: Connection closed by port 2551 [preauth]
Jul 8 15:53:05 skapet sshd[36380]: Invalid user 2%;font-family from port 35369
Jul 8 15:53:05 skapet sshd[36380]: Failed password for invalid user 2%;font-family from port 35369 ssh2
Jul 8 15:53:05 skapet sshd[36380]: Connection closed by port 35369 [preauth]
Jul 8 16:02:05 skapet sshd[5384]: Invalid user Verdana, from port 10140
Jul 8 16:02:05 skapet sshd[5384]: Failed password for invalid user Verdana, from port 10140 ssh2
Jul 8 16:02:06 skapet sshd[5384]: Connection closed by port 10140 [preauth]
Jul 8 16:11:27 skapet sshd[80640]: Invalid user #content{margin from port 27941
Jul 8 16:11:27 skapet sshd[80640]: Failed password for invalid user #content{margin from port 27941 ssh2
Jul 8 16:20:24 skapet sshd[71772]: Invalid user <div from port 5467
Jul 8 16:20:24 skapet sshd[71772]: Failed password for invalid user <div from port 5467 ssh2
Jul 8 16:20:25 skapet sshd[71772]: Connection closed by port 5467 [preauth]
Jul 8 16:29:31 skapet sshd[22288]: Invalid user Error</h1></div>\^M from port 2932
Jul 8 16:29:31 skapet sshd[22288]: Failed password for invalid user Error</h1></div>\^M from port 2932 ssh2
Jul 8 16:29:31 skapet sshd[22288]: Connection closed by port 2932 [preauth]
Jul 8 16:38:32 skapet sshd[64659]: Invalid user id="content">\^M from port 44037
Jul 8 16:38:32 skapet sshd[64659]: Failed password for invalid user id="content">\^M from port 44037 ssh2
Jul 8 16:38:33 skapet sshd[64659]: Connection closed by port 44037 [preauth]
Jul 8 16:47:47 skapet sshd[60396]: Invalid user class="content-container"><fieldset>\^M from port 50741
Jul 8 16:47:47 skapet sshd[60396]: Failed password for invalid user class="content-container"><fieldset>\^M from port 50741 ssh2
Jul 8 16:56:46 skapet sshd[84720]: Invalid user Access from port 56868
Jul 8 16:56:46 skapet sshd[84720]: Failed password for invalid user Access from port 56868 ssh2
Jul 8 16:56:46 skapet sshd[84720]: Connection closed by port 56868 [preauth]
Jul 8 17:05:47 skapet sshd[39792]: Invalid user denied.</h2>\^M from port 55262
Jul 8 17:05:47 skapet sshd[39792]: Failed password for invalid user denied.</h2>\^M from port 55262 ssh2
Jul 8 17:05:47 skapet sshd[39792]: Connection closed by port 55262 [preauth]
Jul 8 17:14:42 skapet sshd[2165]: Invalid user do from port 16650
Jul 8 17:14:43 skapet sshd[2165]: Failed password for invalid user do from port 16650 ssh2
Jul 8 17:14:43 skapet sshd[2165]: Connection closed by port 16650 [preauth]
Jul 8 17:23:39 skapet sshd[45938]: Invalid user have from port 15855
Jul 8 17:23:39 skapet sshd[45938]: Failed password for invalid user have from port 15855 ssh2
Jul 8 17:23:39 skapet sshd[45938]: Connection closed by port 15855 [preauth]
Jul 8 17:32:35 skapet sshd[64595]: Invalid user to from port 64962
Jul 8 17:32:35 skapet sshd[64595]: Failed password for invalid user to from port 64962 ssh2
Jul 8 17:32:35 skapet sshd[64595]: Connection closed by port 64962 [preauth]
Jul 8 17:41:30 skapet sshd[99157]: Invalid user this from port 63460
Jul 8 17:41:30 skapet sshd[99157]: Failed password for invalid user this from port 63460 ssh2 

Jul 8 17:41:30 skapet sshd[99157]: Connection closed by port 63460 [preauth]
Jul 8 17:50:27 skapet sshd[60500]: Invalid user or from port 47364
Jul 8 17:50:27 skapet sshd[60500]: Failed password for invalid user or from port 47364 ssh2
Jul 8 17:50:27 skapet sshd[60500]: Connection closed by port 47364 [preauth]
Jul 8 17:59:26 skapet sshd[57379]: Invalid user using from port 60084
Jul 8 17:59:26 skapet sshd[57379]: Failed password for invalid user using from port 60084 ssh2
Jul 8 17:59:26 skapet sshd[57379]: Connection closed by port 60084 [preauth]
Jul 8 18:08:22 skapet sshd[64892]: Invalid user credentials from port 18558
Jul 8 18:08:22 skapet sshd[64892]: Failed password for invalid user credentials from port 18558 ssh2
Jul 8 18:08:22 skapet sshd[64892]: Connection closed by port 18558 [preauth]
Jul 8 18:17:19 skapet sshd[22377]: Invalid user you from port 46996
Jul 8 18:17:19 skapet sshd[22377]: Failed password for invalid user you from port 46996 ssh2
Jul 8 18:17:19 skapet sshd[22377]: Connection closed by port 46996 [preauth]
Jul 8 18:24:50 skapet sshd[98670]: Connection closed by port 40682 [preauth]

The other IP address offered up:

[Thu Dec 22 19:39:24] peter@skapet:~$ zgrep /var/log/authlog.23.gz
Jul 8 16:10:42 skapet sshd[79062]: Connection closed by port 61453 [preauth]
Jul 8 17:01:03 skapet sshd[28839]: Connection closed by port 59520 [preauth]
Jul 8 17:49:47 skapet sshd[1472]: Connection closed by port 39552 [preauth]
Jul 8 18:34:12 skapet sshd[58208]: Connection closed by port 59520 [preauth]
Jul 8 19:19:12 skapet sshd[93151]: Connection closed by port 6465 [preauth]
Jul 8 20:07:33 skapet sshd[84813]: Connection closed by port 39552 [preauth]
Jul 8 20:59:08 skapet sshd[11083]: Invalid user content="text/html; from port 26240
Jul 8 20:59:08 skapet sshd[11083]: Failed password for invalid user content="text/html; from port 26240 ssh2
Jul 8 20:59:08 skapet sshd[11083]: Connection closed by port 26240 [preauth]
Jul 8 21:47:54 skapet sshd[50641]: Connection closed by port 59520 [preauth]
Jul 8 22:38:16 skapet sshd[33990]: Invalid user Forbidden from port 64640
Jul 8 22:38:16 skapet sshd[33990]: Failed password for invalid user Forbidden from port 64640 ssh2
Jul 8 22:38:16 skapet sshd[33990]: Connection closed by port 64640 [preauth]
Jul 8 23:29:47 skapet sshd[84765]: Invalid user is from port 49280
Jul 8 23:29:47 skapet sshd[84765]: Failed password for invalid user is from port 49280 ssh2
Jul 8 23:29:47 skapet sshd[84765]: Connection closed by port 49280 [preauth]
Jul 9 00:18:32 skapet sshd[75290]: Connection closed by port 13952 [preauth]
Jul 9 01:07:11 skapet sshd[15889]: Connection closed by port 16000 [preauth]
Jul 9 01:54:19 skapet sshd[3570]: Connection closed by port 22144 [preauth]
Jul 9 02:38:44 skapet sshd[212]: Connection closed by port 57472 [preauth]
Jul 9 03:24:00 skapet sshd[38938]: Connection closed by port 50304 [preauth]
Jul 9 04:08:22 skapet sshd[60530]: Connection closed by port 21481 [preauth]
Jul 9 04:55:14 skapet sshd[77880]: Connection closed by port 40064 [preauth]
Jul 9 05:45:26 skapet sshd[65360]: Invalid user 0;color from port 20096
Jul 9 05:45:26 skapet sshd[65360]: Failed password for invalid user 0;color from port 20096 ssh2
Jul 9 05:45:26 skapet sshd[65360]: Connection closed by port 20096 [preauth]
Jul 9 06:35:50 skapet sshd[49775]: Invalid user #header{width from port 8320
Jul 9 06:35:50 skapet sshd[49775]: Failed password for invalid user #header{width from port 8320 ssh2
Jul 9 07:24:21 skapet sshd[88261]: Invalid user 2% from port 57472
Jul 9 07:24:21 skapet sshd[88261]: Failed password for invalid user 2% from port 57472 ssh2
Jul 9 07:24:22 skapet sshd[88261]: Connection closed by port 57472 [preauth]
Jul 9 08:16:55 skapet sshd[79482]: Invalid user 2%;font-family from port 57984
Jul 9 08:16:55 skapet sshd[79482]: Failed password for invalid user 2%;font-family from port 57984 ssh2
Jul 9 08:16:55 skapet sshd[79482]: Connection closed by port 57984 [preauth]
Jul 9 09:05:58 skapet sshd[67909]: Connection closed by port 38016 [preauth]
Jul 9 09:57:24 skapet sshd[51227]: Connection closed by port 22144 [preauth]
Jul 9 10:47:35 skapet sshd[89081]: Invalid user 

The sequences become a little easier to read if we extract the user field from the "Invalid user ..." messages:

[Thu Dec 22 20:23:23] peter@skapet:~$ zgrep /var/log/authlog.23.gz | grep Invalid | awk '{print $8}'
<html <meta content="text/html;
10px \^M

Now looking at what came from the the other IP address we get

[Fri Dec 23 00:51:28] peter@skapet:~$ zgrep /var/log/authlog.23.gz | grep Invalid | awk '{print $8}'

Well, definitely HTML-ish, but no proper document start. Distinctly odd.

The two machines involved are apparently far apart geographically (one in Brazil and the other in Malaysia if the data from whois is anything to go by). It is of course possible that they're still connected somehow, perhaps compromised by the same group of cybercriminals and run by the same operators.

These operators then fatfingered some command or other, and their charges started pushing bizarre streams of HTML at unsuspecting SSH servers in the great elsewhere of the Internet. What they were actually trying to achieve I suspect we'll never know, but HTML was involved.

HTML is also part of the problem in one of the other bizarre phenomena I find at semi-random intervals in the SSH authentication logs.

Here are some samples from the same preserved log file:

[Thu Dec 22 20:1
6:36] peter@skapet:~$ zgrep "Bad protocol" /var/log/authlog.23.gz                                 
Jul  6 19:03:17 skapet sshd[28549]: Bad protocol version identification 'GET / HTTP/1.1' from port 36147
Jul  8 16:14:07 skapet sshd[89181]: Bad protocol version identification 'GET / HTTP/1.1' from port 49385
Jul  8 20:15:15 skapet sshd[28469]: Bad protocol version identification 'GET / HTTP/1.1' from port 55039
Jul  9 10:56:40 skapet sshd[67430]: Bad protocol version identification 'GET / HTTP/1.1' from port 52504

Again, it's not clear what these operations were attempting to do, but to me this looks like they were expecting to find either a web server or perhaps a web proxy listening on port 22.

Just like the first kind of web-to-ssh stupidity this won't actually get the requester anything and you can safely ignore both kinds of activity if you see traces of them in your own logs.

That is, if you have seen something similar in your own logs and you would care to share, I would like to hear from you, via email or in the comments below.

Even more so if you have any input on the question 'what were these clowns trying to achieve?'.

If the log analyses or related activities turn up any useful insights, you won't need to go far from here to check the results.

Good night and good luck.

Update 2016-12-23: Among the various comments that the initial version of this piece generated, two stand out as particularly useful.

The first came in a Facebook comment on my post about the story there, from my former colleague Egil Mõller, who wrote:

"Maybe they read password guesses to try from a central REST service, but that service has somehow broken, and is serving a default error message (which, since it's REST, is most likely in html)?"

The other came from OpenBSD developer Stuart Henderson, who tweeted:

If the link Twitter gave me doesn't work, here's the plaintext of Stuart's tweet:

"@pitrh the GETs aren't all that odd - could easily be scanning via proxy. I see similar on SMTP too. Not sure about the html though.
1:00 PM - 23 Dec 2016 "

I must admit I had not noticed any GETs in any SMTP related logs on my systems, but now I an honor bound to check. Stuart has given me a task, and I must finish it separately.

Also, I really like Egil's input here, because it fits so well with the data we have. In the meantime I discovered more data from a second host. Unfortunately the actual logs had been rotated out of existence, but it was still possible to piece together data on failed logins from the summaries logsentry sends me.

It appears that one machine apparently located in Hong Kong that had been trying a few logins earlier that month, with no apparent succcess, started spitting HTML at roughly the same time the other two did.

Here is all the activity in early July 2016 from that host:

Jul  4 01:08:19 delilah sshd[13635]: Failed password for invalid user bob from port 38728 ssh2
Jul  4 01:18:28 delilah sshd[26798]: Failed password for root from port 9224 ssh2
Jul  4 01:18:29 delilah sshd[26798]: Failed password for root from port 9224 ssh2
Jul  4 01:27:13 delilah sshd[24543]: Failed password for invalid user ts from port 9224 ssh2
Jul  4 01:35:54 delilah sshd[16906]: Failed password for invalid user pi from port 9224 ssh2
Jul  7 18:30:48 skapet sshd[89165]: Failed password for invalid user admin from port 43498 ssh2
Jul  7 18:40:33 skapet sshd[35698]: Failed password for invalid user lp from port 9224 ssh2
Jul  7 18:50:21 skapet sshd[21112]: Failed password for root from port 9224 ssh2
Jul  9 11:33:25 delilah sshd[10234]: Failed password for invalid user <!DOCTYPE from port 46959 ssh2
Jul  9 11:42:28 delilah sshd[10230]: Failed password for invalid user PUBLIC from port 9224 ssh2
Jul  9 11:51:36 delilah sshd[25489]: Failed password for invalid user XHTML from port 9224 ssh2
Jul  9 12:09:47 delilah sshd[28023]: Failed password for invalid user <html from port 9224 ssh2
Jul  9 12:18:51 delilah sshd[9873]: Failed password for invalid user <meta from port 9224 ssh2
Jul  9 12:28:01 delilah sshd[13890]: Failed password for invalid user content="text/html; from port 9224 ssh2
Jul  9 12:37:01 delilah sshd[10856]: Failed password for invalid user <title>403 from port 9224 ssh2
Jul  9 12:46:04 delilah sshd[19947]: Failed password for invalid user Forbidden from port 9224 ssh2
Jul  9 13:04:32 delilah sshd[22444]: Failed password for invalid user <style from port 9224 ssh2
Jul  9 13:13:14 delilah sshd[15268]: Failed password for invalid user body{margin from port 9224 ssh2
Jul  9 13:22:31 delilah sshd[19611]: Failed password for invalid user Helvetica, from port 9224 ssh2
Jul  9 13:31:35 delilah sshd[15652]: Failed password for invalid user fieldset{padding from port 59101 ssh2
Jul  9 13:40:39 delilah sshd[15607]: Failed password for invalid user 10px from port 9224 ssh2
Jul  9 13:51:00 delilah sshd[18900]: Failed password for invalid user \^M from port 9224 ssh2
Jul  9 13:51:00 delilah sshd[18900]: Failed password for invalid user \^M from port 9224 ssh2
Jul  9 14:00:09 delilah sshd[24794]: Failed password for invalid user 0 from port 9224 ssh2
Jul  9 14:18:23 delilah sshd[22897]: Failed password for invalid user #header{width from port 9224 ssh2
Jul  9 14:27:11 delilah sshd[12713]: Failed password for invalid user 2% from port 9224 ssh2
Jul  9 14:35:56 delilah sshd[32320]: Failed password for invalid user 2%;font-family from port 9224 ssh2
Jul  9 14:44:46 delilah sshd[30676]: Failed password for invalid user Verdana, from port 9224 ssh2
Jul  9 14:53:46 delilah sshd[12799]: Failed password for invalid user #content{margin from port 9224 ssh2
Jul  9 15:02:29 delilah sshd[19535]: Failed password for invalid user <div from port 9224 ssh2
Jul  9 15:11:17 delilah sshd[6404]: Failed password for invalid user Error</h1></div>\^M from port 9224 ssh2
Jul  9 15:20:08 delilah sshd[2837]: Failed password for invalid user id="content">\^M from port 9224 ssh2
Jul  9 15:29:13 delilah sshd[7831]: Failed password for invalid user class="content-container"><fieldset>\^M from port 9224 ssh2
Jul  9 15:38:05 delilah sshd[12172]: Failed password for invalid user Access from port 9224 ssh2
Jul  9 15:46:56 delilah sshd[23460]: Failed password for invalid user denied.</h2>\^M from port 9224 ssh2
Jul  9 15:55:48 delilah sshd[19891]: Failed password for invalid user do from port 9224 ssh2
Jul  9 16:13:29 delilah sshd[5999]: Failed password for invalid user to from port 9224 ssh2
Jul  9 16:22:20 delilah sshd[17535]: Failed password for invalid user this from port 9224 ssh2
Jul  9 16:31:11 delilah sshd[8428]: Failed password for invalid user or from port 9224 ssh2
Jul  9 16:40:00 delilah sshd[2522]: Failed password for invalid user using from port 9224 ssh2
Jul  9 16:48:52 delilah sshd[15034]: Failed password for invalid user credentials from port 9224 ssh2
Jul  9 16:57:43 delilah sshd[14515]: Failed password for invalid user you from port 53073 ssh2

If we again extract only the user name field, we get:


From this we see that this host was already busy poking us at long intervals with 'likely' user names, then suddenly started spewing HTML instead of likely user names in roughly the same time frame as the other two.

Seen from my perch here, this serves to validate Egil's suggestion that a common back end system started misbehaving is what caused the odd activity we're seeing.

And the apparent coordination brings to mind the Hail Mary Cloud incidents that we reported on earlier.

I suppose further digging in the logs is warranted.

If you would like to join me in the hunt for more of this, please let me know.

by Peter N. M. Hansteen ( at December 22, 2016 08:29 PM

December 21, 2016

No leap second simulations this year

TurnBackTimeAs some of my readers already know, I changed jobs in Novembre: I left Opera Software to join Telenor Digital. We have decided not to run any leap second simulation here, so I am not going to publish anything on the subject this year. You can still refer to the post The leap second aftermath for some suggestions I wrote after the latest leap second we had in June/July 2015.

Good luck!

Tagged: DevOps, leap second, ntp, Opera, Sysadmin, Telenor Digital

by bronto at December 21, 2016 09:21 PM

December 20, 2016

OpenStack: Quick and automatic instance snapshot backup and restore (and before an apt upgrade) with nova backup

This is a guide that shows you how to create OpenStack instance snapshots automatically, quick and easy. This allows you to create a full backup of the entire instance. This guide has a script that makes creating snapshots from an OpenStack VM automatic via cron. The script uses the `nova backup` function, therefore it also has retention and rotation of the backups. It also features an option to create a snapshot before every apt action, upgrade/install/remove. This way, you can easily restore from the snapshot when something goes wrong after an upgrade. Snapshots are very usefull to restore the entire instance to an earlier state. Do note that this is not the same as a file based backup, you can't select a few files to restore, it's all or nothing.

December 20, 2016 12:00 AM

December 19, 2016


Sysadmins and risk-management

This crossed my timeline today:

This is a risk-management statement that contains all of a sysadmin's cynical-bastard outlook on IT infrastructure.

Disappointed because all of their suggestions for making the system more resilient to failure are shot down by management. Or, some of them are, which is like all in that there are disasters that are uncovered. Commence drinking heavily to compensate.

Frantically busy because they're trying to mitigate all the failure-modes their own damned self using not enough resources, all the while dealing with continual change as the mission of the infrastructure shifts over time.

A good #sysadmin always expects the worst.

Yes, we do. Because all too often, we're the only risk-management professionals a system has. We better understand the risks to the system than anyone else. A sysadmin who plans for failure is one who isn't first on the block when a beheading is called for by the outage-enraged user-base.

However, there are a few failure-modes in this setup that many, many sysadmins fall foul of.

Perfection is the standard.

And no system is perfect.

Humans are shit at gut-level risk-assessment, part 1: If you've had friends eaten by a lion, you see lions everywhere.

This abstract threat has been made all too real, and now lions. Lions everywhere. For sysadmins it's things like multi-disk RAID failures, UPS batteries blowing up, and restoration failures because an application changed its behavior and the existing backup solution no longer was adequate to restore state.

Sysadmins become sensitized to failure. Those once-in-ten-years failures, like datacenter transfer-switch failures or Amazon region-outages, seem immediate and real. I knew a sysadmin who was paralyzed in fear over a multi-disk RAID failure in their infrastructure. They used big disks, who weren't 100% 'enterprise' grade. Recoveries from a single-disk failure were long as a result. Too long. A disk going bad during the recovery was a near certainty in their point of view, never mind that the disks in question were less than 3 years old, and the RAID system they were using had bad-block detection as a background process. That window of outage was too damned long.

Humans are shit at gut-level risk-assessment, part 2: Leeroy Jenkins sometimes gets the jackpot, so maybe you'll get that lucky...

This is why people think they can win mega-million lotterys and in casinos playing roulette. Because sometimes, you have to take a risk for a big payoff.

To sysadmins who have had friends eaten by lions, this way of thinking is completely alien. This is the developer who suggests swapping out the quite functional MySQL databases for Postgres. Or the peer sysadmin who really wants central IT to move away from forklift SAN-based disk-arrays for a bunch of commodity hardware, FreeBSD, and ZFS.

Mm hm. No.

Leeroy Jenkins management and lion-eaten sysadmins make for really unhappy sysadmins.

When it isn't a dev or a peer sysadmin asking, but a manager...

Sysadmin team: It may be a better solution. But do you know how many lions are lurking in the transition process??

Management team: It's a better platform. Do it anyway.

Cue heavy drinking as everyone prepares to lose a friend to lions.

This is why I suggest rewording that statement:

A good #sysadmin always expects the worst.
A great #sysadmin doesn't let that rule their whole outlook.

A great sysadmin has awareness of business risk, not just IT risks. A sysadmin who has been scarred by lions and sees large felines lurking everywhere will be completely miserable in an early or mid-stage startup. In an early stage startup, the big risk on everyone's mind is running out of money and losing their jobs; so that once-in-three-years disaster we feel so acutely is not the big problem it seems. Yeah, it can happen and it could shutter the company if it does happen; but the money remediating that problem would be better spent by expanding marketshare enough that we can assume we'll still be in business 2 years from now. A failure-obsessed sysadmin will not have job satisfaction in such a workplace.

One who has awareness of business risk will wait until the funding runway is long enough that pitching redundancy improvements will actually defend the business. This is a hard skill to learn, especially for people who've been pigeon-holed worker-units their entire carer. I find that asking myself one question helps:

How likely is it that this company will still be here in 2 years? 5? 7? 10?

If the answer to that is anything less than 'definitely', then there are failures that you can accept into your infrastructure.

by SysAdmin1138 at December 19, 2016 04:14 PM

December 15, 2016

Debian Administration

Installing the Go programming language on Debian GNU/Linux

Go is an open source programming language that makes it easy to build simple, reliable, and efficient software. In this brief article we'll show how to install binary releases of the compiler/toolset, and test them.

by Steve at December 15, 2016 08:25 AM

December 14, 2016

Carl Chenet

Feed2tweet 0.8, tool to post RSS feeds to Twitter, released

Feed2tweet 0.8, a self-hosted Python app to automatically post RSS feeds to the Twitter social network, was released this December, 14th.

With this release Feed2tweet now smartly manages the hashtags, adding as much as possible given the size of the tweet.

Also 2 new options are available :

  • –populate-cache to retrieve the last entries of the RSS feeds and store them in the local cache file without posting them on Twitter
  • –rss-sections to display available sections in the RSS feed, allowing to use these section names in your tweet format (see the official documentation for more details)

Feed2tweet 0.8  is already in production for Le Journal du hacker, a French-speaking Hacker News-like website,, a French-speaking job board and this very blog.


What’s the purpose of Feed2tweet?

Some online services offer to convert your RSS entries into Twitter posts. Theses services are usually not reliable, slow and don’t respect your privacy. Feed2tweet is Python self-hosted app, the source code is easy to read and you can enjoy the official documentation online with lots of examples.

Twitter Out Of The Browser

Have a look at my Github account for my other Twitter automation tools:

  • Retweet , retweet all (or using some filters) tweets from a Twitter account to another one to spread content.
  • db2twitter, get data from SQL database (several supported), build tweets and send them to Twitter
  • Twitterwatch, monitor the activity of your Twitter timeline and warn you if no new tweet appears

What about you? Do you use tools to automate the management of your Twitter account? Feel free to give me feedback in the comments below.

by Carl Chenet at December 14, 2016 11:00 PM

December 13, 2016


An update on my Choria project

Some time ago I mentioned that I am working on improving the MCollective Deployment story.

I started a project called Choria that aimed to massively improve the deployment UX and yield a secure and stable MCollective setup for those using Puppet 4.

The aim is to make installation quick and secure, towards that it seems a common end to end install from scratch by someone new to project using a clustered NATS setup can take less than a hour, this is a huge improvement.

Further I’ve had really good user feedback, especially around NATS. One user reports 2000 nodes on a single NATS server consuming 300MB RAM and it being very performant, much more so than the previous setup.

It’s been a few months, this is whats changed:

  • The module now supports every OS AIO Puppet supports, including Windows.
  • Documentation is available on, installation should take about a hour max.
  • The PQL language can now be used to do completely custom infrastructure discovery against PuppetDB.
  • Many bugs have been fixed, many things have been streamlined and made more easy to get going with better defaults.
  • Event Machine is not needed anymore.
  • A number of POC projects have been done to flesh out next steps, things like a very capable playbook system and a revisit to the generic RPC client, these are on GitHub issues.

Meanwhile I am still trying to get to a point where I can take over maintenance of MCollective again, at first Puppet Inc was very open to the idea but I am afraid it’s been 7 months and it’s getting nowhere, calls for cooperation are just being ignored. Unfortunately I think we’re getting pretty close to a fork being the only productive next step.

For now though, I’d say the Choria plugin set is production ready and stable any one using Puppet 4 AIO should consider using these – it’s about the only working way to get MCollective on FOSS Puppet now due to the state of the other installation options.

by R.I. Pienaar at December 13, 2016 06:49 PM

Geek and Artist - Tech

“We’re doing our own flavour of Agile/Scrum”

I won’t descend into hyperbole and say you should run, shrieking and naked into the dark night, when you hear these words. But, it’s worth pondering what exactly it means. I think I’ve (over)used this phrase myself plenty over the years and right now find myself examining why so many people find themselves needing to invent their own version of well accepted software workflow methodologies.

You might say “we just pick the parts that work for us” or “we continually iterate on our workflow and so it is constantly evolving rather than sticking to static definitions of Agile”, or “we haven’t found estimations useful”. Many teams that have a significant infrastructure component to their work find themselves split between Scrum and Kanban. I always imagine that traditional or strict Scrum works best when you are working on a single application and codebase, with pretty well restricted scope and limited technologies in play. I actually crave working in such an environment, since working in teams with broad missions and wide varieties of technologies can make organising the work extremely difficult. At the same time you don’t want to split a reasonably-sized team of 6-8 people into teams of 1-2 people just to make their mission/vision clear.

Some reasons I think “custom Agile/Scrum” happens:

  • Most or all of the team has never actually worked in a real Waterfall development model, and can’t appreciate the reason for all the Agile/Scrum rituals, processes and ideals. This will continue to happen more and more frequently, and is almost guaranteed if you are dealing with those dreaded Millenials.
  • Estimations are hard, and we’d rather not do them.
  • Backlog grooming is hard, and we don’t want to waste time on it. Meeting fatigue in general kills a lot of the rituals.
  • Unclear accountability on the team. Who does it fall on when we don’t meet our goals? What is the outcome?
  • Too many disparate streams of work to have one clear deliverable at the end of the sprint.
  • Various factors as mentioned in the previous paragraph leading to a hybrid Scrum-Kanban methodology being adopted.
  • The need to use electronic tools e.g. Jira/Trello/Mingle/TargetProcess (have you heard of that one?) etc, rather than old fashioned cards or sticky notes on the wall. Conforming to the constraints of your tool of choice (or lack of choice) inevitably make a lot of the rituals much harder. Aligning processes with other teams (sometimes on other continents) also adds to the friction.

So anyway, why is any of this a problem? Well, let’s consider for a moment what the purpose of these workflow tools and processes is. Or at least, in my opinion (and if you disagree, please let me know and let’s discuss it!):

  • Feedback
  • Learning
  • Improvement

I think these three elements are so important to a team, whether you implement Scrum or Kanban or something else. If you pick and choose from different agile methodologies, you’d better be sure you have all of these elements present. Let me give some examples of where the process fails.

You have a diverse team with a broad mission, and various roles like backend, frontend, QA, design etc. Not everybody is able to work on the same thing so your sprint goals look like five or 10 different topics. At the end of the sprint, maybe 70-80% of them are completed, but that’s ok right? You got the majority done – time to celebrate, demo what you did finish and move what’s left over to the next sprint.

Unfortunately what this does is create a habit of acceptable failure. You become accustomed to never completing all the sprint goals, moving tickets over to the following sprint and not reacting to it. Quarterly goals slip but that’s also acceptable. You take on additional “emergency” work into the sprint without much question as slipping from 70% to 65% isn’t a big difference. You’ve just lost one of your most important feedback mechanisms.

If you had a single concrete goal for the sprint, and held yourself to delivering that thing each sprint, you would instead build up the habit of success being normal. The first sprint where that single goal is not delivered gives you a huge red flag that something went wrong and there is a learning opportunity. What did you fail to consider in this sprint that caused the goal to be missed? Did you take on some emergency work that took longer than expected? It’s also a great opportunity for engineers to improve how they estimate their work and how they prioritise. It also facilitates better discussions around priorities – if you come to me and ask me to complete some “small” task, I will ask you to take on responsibility for the entire sprint goal being missed, and explaining that to the stakeholders. 100% to 0% is a much harder pill to swallow than 85% to 80% – and in the latter case I believe these important conversations just aren’t happening.

But let’s say Scrum really doesn’t work for you. I think that’s totally fine, as long as you own up to this and replace the feedback mechanisms of Scrum with that of something else – but not stay in some undefined grey area in the middle. Two-week (or some alternative time period) sprints may not work well, or you might deliver to production every week, or every three weeks. Something that doesn’t align with the sprint. Now you are in a situation where you are working in sprints/iterations that are just arbitrary time containers for work but aren’t set up to deliver you any valuable feedback. Don’t stay in the grey zone – own up to it and at least move to something else like Kanban.

But if you are using Kanban, do think about what feedback mechanisms you now need. Simply limiting work in progress and considering your workflow a pipeline doesn’t provide much intelligence to you about how well it is functioning. Measuring cycle time of tasks is the feedback loop here that tells you when things are going off the rails. If you get to the point where your cycle time is pretty consistent but you find your backlog is growing more and more out of control, you have scope creep or too much additional work is making its way into your team. Either way there is a potential conversation around priorities and what work is critical to the team’s success to be had. Alternatively if cycle time is all over the place then the team can learn from these poor estimates and improve their thought process around the work. Having neither cycle time nor sprint goal success adequately measured leaves you unable to judge healthy workflow or react to it when it could be improved.

I guess you could also disagree with all of this. I’d still argue that if you are in a business or venture that cares about being successful, you want to know that how you are going about your work actually matters. If it isn’t being done very efficiently, you want to know with reasonable certainty what part of your methodology is letting you down and respond to it. If you can’t put your finger on the problem and concretely say “this is wrong, let’s improve it” then you are not only avoiding potential success, but also missing out on amazing opportunities for learning and the challenge of solving interesting problems!

by oliver at December 13, 2016 06:32 PM

December 11, 2016

Toolsmith - GSE Edition: Image Steganography & StegExpose

Cross-posted on the Internet Storm Center Diary.

Updated with contest winners 14 DEC. Congrats to:
Chrissy @SecAssistance
Owen Yang @HomingFromWork
Paul Craddy @pcraddy
Mason Pokladnik - Fellow STI grad
Elliot Harbin @klax0ff

In the last of a three part (Part 1-GCIH, Part 2-GCIA) series focused on tools I revisited during my GSE re-certification process, I thought it'd be timely and relevant to give you a bit of a walkthrough re: steganography tools. Steganography "represents the art and science of hiding information by embedding messages within other, seemingly harmless messages."
Stego has garnered quite a bit of attention again lately as party to both exploitation and exfiltration tactics. On 6 DEC 2016, ESET described millions of victims among readers of popular websites who had been targeted by the Stegano exploit kit hiding in pixels of malicious ads.
The Sucuri blog described credit card swipers in Magento sites on 17 OCT 2016, where attackers used image files as an obfuscation technique to hide stolen details from website owners, in images related to products sold on the victim website.

The GSE certification includes SANS 401 GSEC content, and Day 4 of the GSEC class content includes some time on steganography with the Image Steganography tool. Tools for steganographic creation are readily available, but a bit dated, including Image Steganography, last updated in 2011, and OpenStego, last updated in 2015. There are other older, command-line tools, but these two are really straightforward GUI-based options. Open source or free stego detection tools are unfortunately really dated and harder to find as a whole, unless you're a commercial tool user. StegExpose is one of a few open options that's fairly current (2015) and allows you to conduct steganalysis to detect LSB steganography in images. The LSB is the lowest significant bit in the byte value of the image pixel and LSB-based image steganography embeds the hidden payload in the least significant bits of pixel values of an image. 
Image Steganography uses LSB steganography, making this a perfect opportunity to pit one against the other.
Download Image Steganography from Codeplex, then run Image Steganography Setup.exe. Run Image Steganography after installation and select a PNG for your image. You can then type text you'd like to embed, or input data from a file. I chose wtf.png for my image, and rr.ps1 as my input file. I chose to write out the resulting stego sample to wtf2.png, as seen in Figure 1.

Figure 1: Image Steganography
This process in reverse to decode a message is just as easy. Select the decode radio button, and the UI will switch to decode mode. I dragged the wtf2.png file I'd just created, and opted to write the ouput to the same directory, as seen in Figure 2.
Figure 2: wtf.png decoded

Pretty simple, and the extracted rr.ps1 file was unchanged from the original embedded file.
Now, will StegExpose detect this file as steganographic? Download StegExpose from Github, unpack, and navigate to the resulting directory from a command prompt. Run StegExpose.jar against the directory with your steganographic image as follows: java -jar StegExpose.jar c:\tmp\output. Sure enough, steganography confirmed as seen in Figure 3.
Figure 3: StegExpose
Not bad, right? Easy operations on both sides of the equation.

And now for a little contest. Five readers who email me via russ at holisticinfosec dot org and give me the most precise details regarding what I specifically hid in wtf2.png get a shout out here and $5 Starbucks gift cards for a little Christmastime caffeine.  

Contest: wtf2.png
Note: do not run the actual payload, it will annoy you to no end. If you must run it to decipher it, please do so in a VM. It's not malware, but again, it is annoying.

Cheers...until next time.

by Russ McRee ( at December 11, 2016 05:54 PM

December 09, 2016

Sean's IT Blog

Horizon 7 and App Volumes 2.x Updates

VMware has been committed to adding new features with every Horizon Suite point update, and the latest updates, announced yesterday, are no exception.

The Horizon 7.0.3 update, in conjunction with vSphere 6.5, adds several long-awaited features (announcement blog)(release notes).  These are:

  • Expanded Support for Windows 10
  • H.264 multimonitor support for Windows and Linux
  • A Universal Windows Platform Horizon Client for Blast Extreme
  • Linux Enhancements, including
    • Audio Input support
    • Ubuntu 16.04 Support
    • Clipboard Redirection for all supported versions
    • vGPU support for NVIDIA M6 GPUs for Red Hat Enterprise Linux desktops
  • Support for Windows Server 2016 Remote Desktop Session Hosts and single-use desktops

Two major vSphere 6.5 enhancements that impact Horizon were also highly touted yesterday.  The first is access to the Horizon API in PowerCLI 6.5.  This was released last week with vSphere 6.5, and both Alan Renouf and Thomas Brown have blog posts on how to access the API.   There is also a Github repository with examples on how to use the API. This has been a long awaited, and oft-requested, feature enhancement for Horizon.  While there has been a View PowerCLI module that’s been included since View 4.5, it was very limited and hadn’t been updated with the new features added to Horizon.  The new API access is very raw, and there are currently only two cmdlets, and these are used for connecting to and disconnecting from the API.  However, I expect significant additions to this in future versions of PowerCLI.

The second big announcement is HA support for vGPU-enabled desktops in vSphere 6.5.  This is a huge announcement for customers that require vGPU for 3D workloads.  In previous versions of vSphere, if a host failed, vGPU-enabled desktops would not restart on another host.  This now provides some method of fault tolerance for these VMs.  vMotion is still not supported, and this is a much harder problem to tackle.

Also included with Horizon 7.0.3 is Access Point 2.8 and vRealize Operations for Horizon 6.4 (release notes).  vRealize Operations for Horizon includes several new features including:

  • Support for monitoring Horizon Access Point – including Access Point health and connection information
  • App Volumes support – monitoring which AppStacks are attached to a user session and how long they took to attach
  • New Widgets and reports on application usages in virtual desktop sessions
  • Support for monitoring Cloud Pod Architecture

App Volumes 2.12 was also released yesterday, and it brings significant improvements to the current branch of the application layering software. (Release notes)(Announcement Blog)

The new features in App Volumes are:

  • Logon Enhancements
  • Support for multiple domain controllers and multiple Active Directory forests and domains
  • Communications between App Volumes Manager and agent now default to HTTPS
  • Certification Validation required for communications between vCenter and App Volumes Manager
  • Support for Office 365 (2016) as an App Stack
  • Support for Windows 10 Anniversary Update (AKA Build 1607)

There are also a couple of new Tech Preview features that can be enabled in the latest version.  These features are:

by seanpmassey at December 09, 2016 01:10 PM

December 08, 2016


Announcement: “The Ultimate Game Boy Talk” at 33C3

I will present “The Ultimate Game Boy Talk” at the 33rd Chaos Communication Congress in Hamburg later in December.

If you are interested in attending the talk, please go to, select it and press submit, so the organizers can reserve a big enough room.

The talk continues the spirit of The Ultimate Commodore 64 Talk, which I presented at the same conference eight years ago, as well as several other talks in the series done by others: Atari 2600 (Svolli), Galaksija (Tomaž Šolc), Amiga 500 (rahra).

Here’s the abstract:

The 8-bit Game Boy was sold between 1989 and 2003, but its architecture more closely resembles machines from the early 1980s, like the Commodore 64 or the NES. This talk attempts to communicate “everything about the Game Boy” to the listener, including its internals and quirks, as well as the tricks that have been used by games and modern demos, reviving once more the spirit of times when programmers counted clock cycles and hardware limitations were seen as a challenge.

by Michael Steil at December 08, 2016 02:49 AM

December 07, 2016

Create a PDP-8 OS8 RK05 system disk from RX01 floppies with SIMH (and get text files in and out of the PDP-8)

This guide shows you how to build an RK05 bootable system disk with OS/8 on it for the PDP-8, in the SIMH emulator. We will use two RX01 floppies as the build source, copy over all the files and set up the LPT printer and the PTR/PIP paper tape punch/readers. As an added bonus the article also shows you how to get text files in and out of the PDP-8 sytem using the printer and papertape reader / puncher.

December 07, 2016 12:00 AM

December 06, 2016

Geek and Artist - Tech

What’s Missing From Online Javascript Courses

Perhaps the title is somewhat excessive, but it expresses how I feel about this particular topic. I’m not a “front-end person” (whatever that is) and feel much more comfortable behind an API where you don’t have to worry about design, markup, logic, styling as well as how to put them all together. That being said, I feel it’s an area where I should face my fears head-on, and so I’m doing some small side-projects on the web.

One thing that I realised I had no idea about, is how you actually get a web app running. I don’t mean starting a server, or retrieving a web page, or even which line of javascript is executed first. I’m talking about how you put all the scripts and bits of code in the right place that the browser knows when to run them, and you don’t make an awful mess in the process.

This can only be shown by example, so here’s what I inevitably start with:

    <script src="//" />
    <div id="data" />
    <script type="text/javascript">
        success: function(data) {
          $("data").text = data

Yes, I’m using jQuery and no that code example is probably not entirely correct. I still find there is a reasonable period of experimentation involved before even the simple things like an AJAX call to get some data from an API are working. In any case, here we are with some in-line Javascript and things are generally working as expected. But of course we know that in-lining Javascript is not the way to a working, maintainable application, so as soon as we have something working, we should pull it into its own external script.

    <script src="//" />
    <script src="/javascripts/main.js" />
    <div id="data" />

Uh-oh, it stopped working. The code in main.js is the exact same as what we had in the document before but it is no longer functioning. Already, we are beyond what I’ve seen in most beginner Javascript online courses, yet this seems like a pretty fundamental issue. Of course, the reason is that the script has been loaded and executed in the same order as the script tags and before the HTML elements (including the div we are adding the data to) were present in the DOM.

So naturally we exercise jQuery and fix the problem, by only executing the code once the document is ready and the relevant event handler is fired:

    success: function(data) {
      $("data").text = data

But now we have another problem. We’ve heard from more experienced developers that using jQuery is frowned upon, and although figuring out when the document is loaded seems simple enough to do without using a library, we’re not sure that there is a single cross-browser way of doing it reliably. So jQuery it is.

Actually there is another way, well explained here and seems to be well supported without relying on Javascript functions. You simply drop the “defer” keyword into the script tag you want to execute after parsing of the page, and it will now only run at the right time for our purposes:

    <script src="/javascripts/main.js" defer/>

I had never seen that before, but it was so simple. Many thanks to my coworkers Maria and Thomas for shedding a bit of light on this corner of the browser world for me. Of course, they also mentioned correctly that using jQuery is not an unforgivable sin, nor are some cases of in-line Javascript snippets (look at some of your favourite websites, even those from respected tech giants, and you will undoubtedly see some). But for a novice to web development it is sometimes hard to look beyond Hackernews and figure out what you are meant to be doing.

On to the next web challenge – mastering D3!

by oliver at December 06, 2016 08:54 PM

Multi-repo Git status checking script

I've got a whole bunch of Git repositories in my ~/Projects/ directory. All of those may have unstaged, uncommitted or unpushed changes. I find this hard to keep track of properly, so I wrote a script to do this for me. The output looks like this:


As you can see, it shows:

  • Untracked files: File that are new, are unknown to git and have not been ignored.
  • Uncommitted changes: Files that are known to git and have changes which are not committed.
  • Needs push: New local commits which have not been pushed to the remove origin. 

The script scans for .git directories under the given path. It will only scan a certain level deep. The default for this is 2, which means "all directories directly under this directory". A value of '3' would scan two directories deep.

Full usage:

Usage: <DIR> [DEPTH=2]
​Scan for .git dirs under DIR (up to DEPTH dirs deep) and show git status

Get it from the Github project page.

by admin at December 06, 2016 04:01 PM

December 03, 2016

Vincent Bernat

Build-time dependency patching for Android

This post shows how to patch an external dependency for an Android project at build-time with Gradle. This leverages the Transform API and Javassist, a Java bytecode manipulation tool.

buildscript {
    dependencies {
        classpath ''
        classpath ''
        classpath 'org.javassist:javassist:3.21.+'
        classpath 'commons-io:commons-io:2.4'

Disclaimer: I am not a seasoned Android programmer, so take this with a grain of salt.


This section adds some context to the example. Feel free to skip it.

Dashkiosk is an application to manage dashboards on many displays. It provides an Android application you can install on one of those cheap Android sticks. Under the table, the application is an embedded webview backed by the Crosswalk Project web runtime which brings an up-to-date web engine, even for older versions of Android1.

Recently, a security vulnerability has been spotted in how invalid certificates were handled. When a certificate cannot be verified, the webview defers the decision to the host application by calling the onReceivedSslError() method:

Notify the host application that an SSL error occurred while loading a resource. The host application must call either callback.onReceiveValue(true) or callback.onReceiveValue(false). Note that the decision may be retained for use in response to future SSL errors. The default behavior is to pop up a dialog.

The default behavior is specific to Crosswalk webview: the Android builtin one just cancels the load. Unfortunately, the fix applied by Crosswalk is different and, as a side effect, the onReceivedSslError() method is not invoked anymore2.

Dashkiosk comes with an option to ignore TLS errors3. The mentioned security fix breaks this feature. The following example will demonstrate how to patch Crosswalk to recover the previous behavior4.

Simple method replacement§

Let’s replace the shouldDenyRequest() method from the org.xwalk.core.internal.SslUtil class with this version:

// In SslUtil class
public static boolean shouldDenyRequest(int error) {
    return false;

Transform registration§

Gradle Transform API enables the manipulation of compiled class files before they are converted to DEX files. To declare a transform and register it, include the following code in your build.gradle:

import org.gradle.api.logging.Logger

class PatchXWalkTransform extends Transform {
    Logger logger = null;

    public PatchXWalkTransform(Logger logger) {
        this.logger = logger

    String getName() {
        return "PatchXWalk"

    Set<QualifiedContent.ContentType> getInputTypes() {
        return Collections.singleton(QualifiedContent.DefaultContentType.CLASSES)

    Set<QualifiedContent.Scope> getScopes() {
        return Collections.singleton(QualifiedContent.Scope.EXTERNAL_LIBRARIES)

    boolean isIncremental() {
        return true

    void transform(Context context,
                   Collection<TransformInput> inputs,
                   Collection<TransformInput> referencedInputs,
                   TransformOutputProvider outputProvider,
                   boolean isIncremental) throws IOException, TransformException, InterruptedException {
        // We should do something here

// Register the transform
android.registerTransform(new PatchXWalkTransform(logger))

The getInputTypes() method should return the set of types of data consumed by the transform. In our case, we want to transform classes. Another possibility is to transform resources.

The getScopes() method should return a set of scopes for the transform. In our case, we are only interested by the external libraries. It’s also possible to transform our own classes.

The isIncremental() method returns true because we support incremental builds.

The transform() method is expected to take all the provided inputs and copy them (with or without modifications) to the location supplied by the output provider. We didn’t implement this method yet. This causes the removal of all external dependencies from the application.

Noop transform§

To keep all external dependencies unmodified, we must copy them:

void transform(Context context,
               Collection<TransformInput> inputs,
               Collection<TransformInput> referencedInputs,
               TransformOutputProvider outputProvider,
               boolean isIncremental) throws IOException, TransformException, InterruptedException {
    inputs.each {
        it.jarInputs.each {
            def jarName =
            def src = it.getFile()
            def dest = outputProvider.getContentLocation(jarName, 
                                                         it.contentTypes, it.scopes,
            def status = it.getStatus()
            if (status == Status.REMOVED) { // ❶
      "Remove ${src}")
            } else if (!isIncremental || status != Status.NOTCHANGED) { // ❷
      "Copy ${src}")
                FileUtils.copyFile(src, dest)

We also need two additional imports:


Since we are handling external dependencies, we only have to manage JAR files. Therefore, we only iterate on jarInputs and not on directoryInputs. There are two cases when handling incremental build: either the file has been removed (❶) or it has been modified (❷). In all other cases, we can safely assume the file is already correctly copied.

JAR patching§

When the external dependency is the Crosswalk JAR file, we also need to modify it. Here is the first part of the code (replacing ❷):

if ("${src}" ==~ ".*/org.xwalk/xwalk_core.*/classes.jar") {
    def pool = new ClassPool()
    def ctc = pool.get('org.xwalk.core.internal.SslUtil') // ❸

    def ctm = ctc.getDeclaredMethod('shouldDenyRequest')
    ctc.removeMethod(ctm) // ❹

public static boolean shouldDenyRequest(int error) {
    return false;
""", ctc)) // ❺

    def sslUtilBytecode = ctc.toBytecode() // ❻

    // Write back the JAR file
    // …
} else {"Copy ${src}")
    FileUtils.copyFile(src, dest)

We also need the following additional imports to use Javassist:

import javassist.ClassPath
import javassist.ClassPool
import javassist.CtNewMethod

Once we have located the JAR file we want to modify, we add it to our classpath and retrieve the class we are interested in (❸). We locate the appropriate method and delete it (❹). Then, we add our custom method using the same name (❺). The whole operation is done in memory. We retrieve the bytecode of the modified class in ❻.

The remaining step is to rebuild the JAR file:

def input = new JarFile(src)
def output = new JarOutputStream(new FileOutputStream(dest))

// ❼
input.entries().each {
    if (!it.getName().equals("org/xwalk/core/internal/SslUtil.class")) {
        def s = input.getInputStream(it)
        output.putNextEntry(new JarEntry(it.getName()))
        IOUtils.copy(s, output)

// ❽
output.putNextEntry(new JarEntry("org/xwalk/core/internal/SslUtil.class"))


We need the following additional imports:

import java.util.jar.JarEntry
import java.util.jar.JarFile
import java.util.jar.JarOutputStream

There are two steps. In ❼, all classes are copied to the new JAR, except the SslUtil class. In ❽, the modified bytecode for SslUtil is added to the JAR.

That’s all! You can view the complete example on GitHub.

More complex method replacement§

In the above example, the new method doesn’t use any external dependency. Let’s suppose we also want to replace the sslErrorFromNetErrorCode() method from the same class with the following one:


// In SslUtil class
public static SslError sslErrorFromNetErrorCode(int error,
                                                SslCertificate cert,
                                                String url) {
    switch(error) {
            return new SslError(SslError.SSL_IDMISMATCH, cert, url);
        case NetError.ERR_CERT_DATE_INVALID:
            return new SslError(SslError.SSL_DATE_INVALID, cert, url);
            return new SslError(SslError.SSL_UNTRUSTED, cert, url);
    return new SslError(SslError.SSL_INVALID, cert, url);

The major difference with the previous example is that we need to import some additional classes.

Android SDK import§

The classes from the Android SDK are not part of the external dependencies. They need to be imported separately. The full path of the JAR file is:

androidJar = "${android.getSdkDirectory().getAbsolutePath()}/platforms/" +

We need to load it before adding the new method into SslUtil class:

def pool = new ClassPool()
def ctc = pool.get('org.xwalk.core.internal.SslUtil')
def ctm = ctc.getDeclaredMethod('sslErrorFromNetErrorCode')
// …

External dependency import§

We must also import and therefore, we need to put the appropriate JAR in our classpath. The easiest way is to iterate through all the external dependencies and add them to the classpath.

def pool = new ClassPool()
inputs.each {
    it.jarInputs.each {
        def jarName =
        def src = it.getFile()
        def status = it.getStatus()
        if (status != Status.REMOVED) {
def ctc = pool.get('org.xwalk.core.internal.SslUtil')
def ctm = ctc.getDeclaredMethod('sslErrorFromNetErrorCode')
// Then, rebuild the JAR...

Happy hacking!

  1. Before Android 4.4, the webview was severely outdated. Starting from Android 5, the webview is shipped as a separate component with updates. Embedding Crosswalk is still convenient as you know exactly which version you can rely on. 

  2. I hope to have this fixed in later versions. 

  3. This may seem harmful and you are right. However, if you have an internal CA, it is currently not possible to provide its own trust store to a webview. Moreover, the system trust store is not used either. You also may want to use TLS for authentication only with client certificates, a feature supported by Dashkiosk

  4. Crosswalk being an opensource project, an alternative would have been to patch Crosswalk source code and recompile it. However, Crosswalk embeds Chromium and recompiling the whole stuff consumes a lot of resources. 

by Vincent Bernat at December 03, 2016 10:20 PM

November 30, 2016

Carl Chenet

My Free Software activities in November 2016

My Monthly report for Novembre 2016 gives an extended list of what were my Free Software related activities during this month.

Personal projects:

Journal du hacker:

The Journal du hacker is a frenck-speaking Hacker News-like website dedicated to the french-speaking Free and Open source Software community.


That’s all folks! See you next month!

by Carl Chenet at November 30, 2016 11:00 PM

November 28, 2016

Michael Biven

It Was Never About Ops

For a while I’ve been thinking about Susan J. Fowler’s Ops Identitfy Crisis post. Bits I agreed with and some I did not.

My original reaction to the post was pretty pragmatic. I had concerns (and still do) about adding more responsibilities onto software engineers (SWE). It’s already fairly common to have them responsible for QA, database administration and security tasks but now ops issues are being considered as well.

I suspect there is an as of yet unmeasured negative impact to the productivity of teams that keep expanding the role of SWE. You end up deprioritizing the operations and systems related concerns, because new features and bug fixes will always win out when scheduling tasks.

Over time I’ve refined my reaction to this. The traditional operations role is the hole that you’ve been stuck in and engineering is how you get out of that hole. It’s not that you don’t need Ops teams or engineers anymore. It’s simply that you’re doing it wrong.

It was never solely about operations. There’s always been an implied hint of manual effort for most Ops teams. We’ve seen a quick return from having SWE handle traditional ops tasks, but that doesn’t mean that the role won’t be needed anymore. Previously we’ve been able to add people to ops to continue to scale with the growth of the company, but those days are gone for most. What needs to change is how we approach the work and the skills needed to do the job.

When you’re unable to scale to meet the demands you’ll end up feeling stuck in a reactive and constantly interruptible mode of working. This can then make operations feel more like a burden rather than a benefit. This way of thinking is part of the reason why I think many of the tools and services created by ops teams are not thought of as actual products.

Ever since we got an API to interact with the public cloud and then later the private cloud we’ve been trying to redefine the role of ops. As the ecosystem of tools has grown and changed over time we’ve continued to refine that role. While thinking on the impact Fowler’s post I know that I agree with her that the skills needed are different from they were eight years ago, but the need for the role hasn’t decreased. Instead it’s grown to match the growth of the products it has been supporting. This got me thinking about how I’ve been working during those eight years and looking back it’s easy to see what worked and what didn’t. These are the bits that worked for me.

First don’t call it ops anymore. Sometimes names do matter. By continuing to use “Ops” in the team name or position we continue to hold onto that reactive mindset.

Make scalability your main priority and start building the products and services that become the figurative ladder to get you out of the hole you’re in. I believe you can meet that by focusing on three things: Reliability, Availability, and Serviceability.

For anything to scale it first needs to be reliable and available if people are going to use it. To be able to meet the demands of the growth you’re seeing the products need to be serviceable. You must be able to safely make changes to them in a controlled and observable fashion.

Every product built out of these efforts should be considered a first class citizen. The public facing services and the internal ones should be considered equals. If your main priority is scaling to meet the demands of your growth, then there should be no difference in how you design, build, maintain, or consider anything no matter where it is in the stack.

Focus on scaling your engineers and and making them force multipliers for the company. There is a cognitive load placed on individuals, teams, and organizations for the work they do. Make sure to consider this in the same way we think of the load capacity of a system. At a time where we’re starting to see companies break their products into smaller more manageable chunks (microservices), we’re close to doing the exact opposite for our people and turning the skills needed to do the work into one big monolith.

If you’ve ever experienced yourself or have seen any of your peers go through burnout what do you think is going to happen as we continue to pile on additional responsibilities?

The growth we’re seeing is the result of businesses starting to run into web scale problems.

Web scale describes the tendency of modern sites – especially social ones – to grow at (far-)greater-than-linear rates. Tools that claim to be “web scale” are (I hope) claiming to handle rapid growth efficiently and not have bottlenecks that require rearchitecting at critical moments. The implication for “web scale” operations engineers is that we have to understand this non-linear network effect. Network effects have a profound effect on architecture, tool choice, system design, and especially capacity planning.

Jacob Kaplan-Moss

The real problem I think we’ve always been facing is making sure you have the people you need to do the job. Before we hit web scale issues we could usually keep up by having people pull all nighters, working through weekends or if you’re lucky hiring more. The ability for hard work to make up for any short comings in planning or preparations simply can no longer keep up. The problem has never been with the technology or the challenges you’re facing. It’s always been about having the right people.

In short you can …

  1. Expect services to grow at a non-linear rate.
  2. To be able to keep up with this growth you’ll need to scale both your people and your products.
  3. Scale your people by giving them the time and space to focus on scaling your products.
  4. Scale your products by focusing on Reliability, Availability, and Serviceability.

To think that new tools or services will be the only (or main) answer to the challenges you’re facing brought on from growth is a mistake. You will always need people to understand what is happening and then design, implement, and maintain your solutions. These new tools and services should increase the impact of each member of your team, but it is a mistake to think it will replace a role.

November 28, 2016 11:11 PM

November 27, 2016

Toolsmith - GSE Edition: Scapy vs CozyDuke

In continuation of observations from my GIAC Security Expert re-certification process, I'll focus here on a GCIA-centric topic: Scapy. Scapy is essential to the packet analyst skill set on so many levels. For your convenience, the Packetrix VM comes preconfigured with Scapy and Snort, so you're ready to go out of the gate if you'd like to follow along for a quick introduction.
Scapy is "a powerful interactive packet manipulation program. It is able to forge or decode packets of a wide number of protocols, send them on the wire, capture them, match requests and replies, and much more." This includes the ability to handle most tasks such as scanning, tracerouting, probing, unit tests, attacks or network discovery, thus replacing functionality expected from hping, 85% of nmap, arpspoof, tcpdump, and others.
If you'd really like to dig in, grab TJ O'Connor's  Violent Python: A Cookbook for Hackers, Forensic Analysts, Penetration Testers and Security Engineers (you should already have it), as first discussed here in January 2013. TJ loves him some Scapy: Detecting and Responding to Data Link Layer Attacks is another reference. :-)
You can also familiarize yourself with Scapy's syntax in short order with the SANS Scapy Cheat Sheet as well.
Judy Novak's SANS GIAC Certified Intrusion Analyst Day 5 content offers a nice set of walk-throughs using Scapy, and given that it is copyrighted and private material, I won't share them here, but will follow a similar path so you have something to play along with at home. We'll use a real-world APT scenario given recent and unprecedented Russian meddling in American politics. According to SC Magazine, "Russian government hackers apparently broke into the Democratic National Committee (DNC) computer systems" in infiltrations believed to be the work of two different Russian groups, namely Cozy Bear/ CozyDuke/APT 29 and Fancy Bear/Sofacy/APT 28, working separately. As is often the case, ironically and consistently, one the best overviews of CozyDuke behaviors comes via Kaspersky's Securelist. This article is cited as the reference in a number of Emerging Threats Snort/Suricata rules for CozyDuke. Among them, 2020962 - ET TROJAN CozyDuke APT HTTP Checkin, found in the trojan.rules file, serves as a fine exemplar.
I took serious liberties with the principles of these rules and oversimplified things significantly with a rule as added to my local.rules file on my Packetrix VM. I then took a few quick steps with Scapy to ensure that the rule would fire as expected. Of the IOCs derived from the Securelist article, we know a few things that, if built into a PCAP with Scapy, should cause the rule to fire when the PCAP is read via Snort.
  1. CozyDuke client to C2 calls were over HTTP
  2. Requests for C2 often included a .php reference, URLs included the likes of /ajax/index.php
  3. was one of the C2 IPs, can be used as an example destination IP address
The resulting simpleton Snort rule appears in Figure 1.

Figure 1: Simple rule
To quickly craft a PCAP to trigger this rule, at a bash prompt, I ran scapy, followed by syn = IP(src="", dst="")/TCP(sport=1337, dport=80, flags="S")/"GET /ajax/index.php HTTP/1.1", then wrote the results out with wrpcap("/tmp/CozyDukeC2GET.pcap", syn), as seen in Figure 2.

Figure 2: Simple Scapy
Then a quick run of the resulting file through Snort with snort -A console -q -K none -r /tmp/CozyDukeC2GET.pcap -c ../etc/snort.conf, and we have a hit as seen in Figure 3.

Figure 3: Simple result

Scapy is ridiculously powerful and is given no justice here, hopefully just enough information to entice you to explore further. With just the principles established here, you can see the likes of options to craft and manipulate with ls(TCP) and ls(IP).
Figure 4: ls()

If you're studying for the likes of GCIA or just looking to improve your understanding of TCP/IP and NSM, no better way to do so than with Scapy.
Cheers...until next time.

by Russ McRee ( at November 27, 2016 07:04 PM