Planet SysAdmin


August 02, 2016

Sean's IT Blog

#GRIDDays Followup – Understanding NVIDIA GRID vGPU Part 1

Author Node: This post has been a few months in the making.  While GRIDDays was back in March, I’ve had a few other projects that have kept this on the sidelines until now.  This is Part 1.  Part 2 will be coming at some point in the future.  I figured 1200 words on this was good enough for one chunk.

The general rule of thumb is that if a virtual desktop requires some dedicated hardware – examples include serial devices, hardware license dongles, and physical cards, it’s probably not a good fit to be virtualized.  This was especially true of workloads that required high-end 3D acceleration.  If a virtual workload required 3D graphics, multiple high-end Quadro cards hard to be installed in the server and then passed through to the virtual machines that required them. 

Since pass-through GPUs can’t be shared amongst VMs, this design doesn’t scale well.  There is a limit to the number of cards I could install in a host, and that limited the number of 3D workloads I could run.  If I needed more, I would have to add hosts.  It also limits the flexibility in the environment as VMs with pass-through hardware can’t easily be moved to a new host if maintenance is needed or a hardware failure occurs.

NVIDIA created the GRID products to address the challenges of GPU virtualization.  GRID technology combines purpose-built graphics hardware, software, and drivers to allow multiple virtual machines to access a GPU. 

I’ve always wondered how it worked, and how it ensured that all configured VMs had equal access to the GPU.  I had the opportunity to learn about the technology and the underlying concepts a few weeks ago at NVIDIA GRID Days. 

Disclosure: NVIDIA paid for my travel, lodging, and some of my meals while I was out in Santa Clara.  This has not influenced the content of this post.

Note:  All graphics in this slide are courtesy of NVIDIA.

How it Works – Hardware Layer

So how does a GRID card work?  In order to understand it, we have to start with the hardware.  A GRID card is a PCIe card with multiple GPUs on the board.  The hardware includes the same features that many of the other NVIDIA products have including framebuffer (often referred to as video memory), graphics compute cores, and hardware dedicated to video encode and decode. 

image

Interactions between an operating system and a PCIe hardware device happen through the base address register.  Base address registers are used to hold memory addresses used by a physical device.  Virtual machines don’t have full access to the GPU hardware, so they are allocated a subset of the GPU’s base address registers for communication with the hardware.  This is called a virtual BAR. 

image

Access to the GPU Base Address Registers, and by extension the Virtual BAR, is handled through the CPU’s Memory Management Unit.  The MMU handles the translation of the virtual BAR memory addresses into the corresponding physical memory addresses used by the GPU’s BAR.  The translation is facilitated by page tables managed by the hypervisor.

The benefit of the virtual bar and hardware-assisted translations is that it is secure.  VMs can only access the registers that they are assigned, and they cannot access any other locations outside of the virtual BAR.

image

The architecture described above – assigning a virtual base address register space that corresponds to a subset of the physical base address register allows multiple VMs to securely share one physical hardware device.  That’s only one part of the story.  How does work actually get from the guest OS driver to the GPU?  And how does the GPU actually manage GPU workloads from multiple VMs?

When the NVIDIA driver submits a job or workload to the GPU, it gets placed into a channel.  A channel is essentially a queue or a line that is exposed through each VM’s virtual BAR.  Each GPU has a fixed number of channels available, and channels are allocated to each VM by dividing the total number of channels by the number of users that can utilize a profile.  So if I’m using a profile that can support 16 VMs per GPU, each VM would get 1/16th of the channels. 

When a virtual desktop user opens an application that requires resources on the GPU, the NVIDIA driver in the VM will dedicate a channel to that application.  When that application needs the GPU to do something, the NVIDIA driver will submit that job to channels allocated to the application on the GPU through the virtual BAR.

image

So now that the application is queue up for execution, something needs to get it into the GPU for execution.  That job is handled by the scheduler.  The scheduler will move work from active channels into the GPU engines.  The GPU has four engines for handling a few different tasks – graphics compute, video encode and decode, and a copy engine.  The GPU engines are timeshared (more on that below), and they execute jobs in parallel.

When active jobs are placed on an engine, they are executed sequentially.  When a job is completed, the NVIDIA driver is signaled that the work has been completed, and the scheduler loads the next job onto the engine to begin processing.

image

Scheduling

There are two types of scheduling in the computing world – sequential and parallel.  When sequential scheduling is used, a single processor executes each job that it receives in order.  When it completes that job, it moves onto the next.  This can allow a single fast processor to quickly move through jobs, but complex jobs can cause a backup and delay the execution of waiting jobs.

Parallel scheduling uses multiple processors to execute jobs at the same time.  When a job on one processor completes, it moves the next job in line onto the processor.  Individually, these processors are too slow to handle a complex job.  But they prevent a single job from clogging the pipeline.

A good analogy to this would be the checkout lane at a department store.  The cashier (and register) is the processor, and each customer is a job that needs to be executed.  Customers are queued up in line, and as the cashier finishes checking out one customer, the next customer in the queue is moved up.  The cashier can usually process users efficiently and keep the line moving, but if a customer with 60 items walks into the 20 items or less lane, it would back up the line and prevent others from checking out.

This example works for parallel execution as well.  Imagine that same department store at Christmas.  Every cash register is open, and there is a person at the front of the line directing where people go.  This person is the scheduler, and they are placing customers (jobs) on registers  (GPU engines) as soon as they have finished with their previous customer.

Graphics Scheduling

So how does GRID ensure that all VMs have equal access to the GPU engines?  How does it prevent one VM from hogging all the resources on a particular engine?

The answer comes in the way that the scheduler works.  The scheduler uses a method called round-robin time slicing.  Round-robin time slicing works by giving each channel a small amount of time on a GPU engine.  The channel has exclusive access to the GPU engine until the timeslice expires or until there are no more work items in the channel.

If all of the work in a channel is completed before the timeslice expires, any spare cycles are redistributed to other channels or VMs.  This ensures that the GPU isn’t sitting idle while jobs are queued in other channels.

The next part of the Understanding vGPU series will cover memory management on the GRID cards.


by seanpmassey at August 02, 2016 01:00 PM

Chris Siebenmann

My new key binding hack for xcape and dmenu

When I started using dmenu, I set up my window manager to start it when I hit F5. F5 was a conveniently reachable key on the keyboard that I was using at the time and I was already binding F2 and F3 (for 'lower window' and 'raise window'). This worked pretty well, but there was a slight inconvenience in that various programs use F5 for their 'refresh this' operation and my window manager key binding thus stole F5 from them. Web browsers have standardized on this, but more relevant for me is that kdiff3 also uses it to refresh the diff it displays.

(For various reasons kdiff3 is my current visual diff program. There's also meld, but for some reason I never took to it the way I've wound up with kdiff3. There are other alternatives that I know even less about.)

Later on I got a new keyboard that made F5 less conveniently accessible. Luckily, shortly afterwards evaryont turned me on to xcape and I promptly put it to work to make tapping the CapsLock key into an alias for F5, so that I could call up dmenu using a really convenient key. Fast forward to today and the obvious has happened; I don't use the actual F5 key to call up dmenu any more, I do it entirely by tapping CapsLock, because that's much nicer.

Xcape doesn't directly invoke programs; instead it turns tapping a key (here, the CapsLock) into the press of some other key (here, F5). It's up to you to make the other key do something interesting. Recently, the obvious struck me. If all I'm using the F5 key binding for is as a hook for xcape, I don't actually have to bind F5 in fvwm; I can bind any arbitrary key and then have xcape generate that key. In particular, I can bind a key that I'm not using and thereby free up F5 so regular programs can use it.

If I was clever and energetic, I would trawl through the X key database to find some key that X knows about but that doesn't actually exist on my keyboard. I'm not that energetic right now, so I went for a simpler hack; instead of binding plain F5, I've now bound Shift Control Alt F5. This theoretically could collide with what some program wants to do with F5, but it's extremely unlikely and it frees up plain F5.

(It also has the potentially useful side effect that I can still call up dmenu if the xcape process is dead. If I bound dmenu to a completely inaccessible key, that wouldn't be possible. Of course the real answer is that if xcape dies I'm just going to restart it, and if this isn't possible for some reason I'm going to dynamically rebind dmenu to an accessible key by using fvwm's console.)

by cks at August 02, 2016 03:01 AM

August 01, 2016

Simon Lyall

Putting Prometheus node_exporter behind apache proxy

I’ve been playing with Prometheus monitoring lately. It is fairly new software that is getting popular. Prometheus works using a pull architecture. A central server connects to each thing you want to monitor every few seconds and grabs stats from it.

In the simplest case you run the node_exporter on each machine which gathers about 600-800 (!) metrics such as load, disk space and interface stats. This exporter listens on port 9100 and effectively works as an http server that responds to “GET /metrics HTTP/1.1” and spits several hundred lines of:

node_forks 7916
node_intr 3.8090539e+07
node_load1 0.47
node_load15 0.21
node_load5 0.31
node_memory_Active 6.23935488e+08

Other exporters listen on different ports and export stats for apache or mysql while more complicated ones will act as proxies for outgoing tests (via snmp, icmp, http). The full list of them is on the Prometheus website.

So my problem was that I wanted to check my virtual machine that is on Linode. The machine only has a public IP and I didn’t want to:

  1. Allow random people to check my servers stats
  2. Have to setup some sort of VPN.

So I decided that the best way was to just use put a user/password on the exporter.

However the node_exporter does not  implement authentication itself since the authors wanted the avoid maintaining lots of security code. So I decided to put it behind a reverse proxy using apache mod_proxy.

Step 1 – Install node_exporter

Node_exporter is a single binary that I started via an upstart script. As part of the upstart script I told it to listen on localhost port 19100 instead of port 9100 on all interfaces

# cat /etc/init/prometheus_node_exporter.conf
description "Prometheus Node Exporter"

start on startup

chdir /home/prometheus/

script
/home/prometheus/node_exporter -web.listen-address 127.0.0.1:19100
end script

Once I start the exporter a simple “curl 127.0.0.1:19100/metrics” makes sure it is working and returning data.

Step 2 – Add Apache proxy entry

First make sure apache is listening on port 9100 . On Ubuntu edit the /etc/apache2/ports.conf file and add the line:

Listen 9100

Next create a simple apache proxy without authentication (don’t forget to enable mod_proxy too):

# more /etc/apache2/sites-available/prometheus.conf 
<VirtualHost *:9100>
 ServerName prometheus

CustomLog /var/log/apache2/prometheus_access.log combined
 ErrorLog /var/log/apache2/prometheus_error.log

ProxyRequests Off
 <Proxy *>
Allow from all
 </Proxy>

ProxyErrorOverride On
 ProxyPass / http://127.0.0.1:19100/
 ProxyPassReverse / http://127.0.0.1:19100/

</VirtualHost>

This simply takes requests on port 9100 and forwards them to localhost port 19100 . Now reload apache and test via curl to port 9100. You can also use netstat to see what is listening on which ports:

Proto Recv-Q Send-Q Local Address   Foreign Address State  PID/Program name
tcp   0      0      127.0.0.1:19100 0.0.0.0:*       LISTEN 8416/node_exporter
tcp6  0      0      :::9100         :::*            LISTEN 8725/apache2

 

Step 3 – Get Prometheus working

I’ll assume at this point you have other servers working. What you need to do now is add the following entries for you server in you prometheus.yml file.

First add basic_auth into your scape config for “node” and then add your servers, eg:

- job_name: 'node'

  scrape_interval: 15s

  basic_auth: 
    username: prom
    password: mypassword

  static_configs:
    - targets: ['myserver.example.com:9100']
      labels: 
         group: 'servers'
         alias: 'myserver'

Now restart Prometheus and make sure it is working. You should see the following lines in your apache logs plus stats for the server should start appearing:

10.212.62.207 - - [31/Jul/2016:11:31:38 +0000] "GET /metrics HTTP/1.1" 200 11377 "-" "Go-http-client/1.1"
10.212.62.207 - - [31/Jul/2016:11:31:53 +0000] "GET /metrics HTTP/1.1" 200 11398 "-" "Go-http-client/1.1"
10.212.62.207 - - [31/Jul/2016:11:32:08 +0000] "GET /metrics HTTP/1.1" 200 11377 "-" "Go-http-client/1.1"

Notice that connections are 15 seconds apart, get http code 200 and are 11k in size. The Prometheus server is using Authentication but apache doesn’t need it yet.

Step 4 – Enable Authentication.

Now create an apache password file:

htpasswd -cb /home/prometheus/passwd prom mypassword

and update your apache entry to the followign to enable authentication:

# more /etc/apache2/sites-available/prometheus.conf
 <VirtualHost *:9100>
 ServerName prometheus

 CustomLog /var/log/apache2/prometheus_access.log combined
 ErrorLog /var/log/apache2/prometheus_error.log

 ProxyRequests Off
 <Proxy *>
 Order deny,allow
 Allow from all
 #
 AuthType Basic
 AuthName "Password Required"
 AuthBasicProvider file
 AuthUserFile "/home/prometheus/passwd"
 Require valid-user
 </Proxy>

 ProxyErrorOverride On
 ProxyPass / http://127.0.0.1:19100/
 ProxyPassReverse / http://127.0.0.1:19100/
 </VirtualHost>

After you reload apache you should see the following:

10.212.56.135 - prom [01/Aug/2016:04:42:08 +0000] "GET /metrics HTTP/1.1" 200 11394 "-" "Go-http-client/1.1"
10.212.56.135 - prom [01/Aug/2016:04:42:23 +0000] "GET /metrics HTTP/1.1" 200 11392 "-" "Go-http-client/1.1"
10.212.56.135 - prom [01/Aug/2016:04:42:38 +0000] "GET /metrics HTTP/1.1" 200 11391 "-" "Go-http-client/1.1"

Note that the “prom” in field 3 indicates that we are logging in for each connection. If you try to connect to the port without authentication you will get:

Unauthorized
This server could not verify that you
are authorized to access the document
requested. Either you supplied the wrong
credentials (e.g., bad password), or your
browser doesn't understand how to supply
the credentials required.

That is pretty much it. Note that will need to add additional Virtualhost entries for more ports if you run other exporters on the server.

 

FacebookGoogle+Share

by simon at August 01, 2016 11:03 PM

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – July 2016

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts/topics this month:
  1. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009. Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” …  [235 pageviews]
  2. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on security monitoring use cases here! [156 pageviews]
  3. Simple Log Review Checklist Released!” is often at the top of this list – this aging checklist is still a very useful tool for many people. “On Free Log Management Tools” is a companion to the checklist (updated version) [56 pageviews]
  4. My classic PCI DSS Log Review series is always popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ as well), useful for building log review processes and procedures , whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book (out in its 4th edition!)[40+ pageviews to the main tag]
  5. “SIEM Resourcing or How Much the Friggin’ Thing Would REALLY Cost Me?” is a quick framework for assessing the SIEM project (well, a program, really) costs at an organization (much more details on this here in this paper). [70 pageviews of total 2891 pageviews to all blog page]
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has about 5X of the traffic of this blog]: 
 
Current research on SOC and threat intelligence [2 projects]:
 
Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015.
Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on Aug 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Previous post in this endless series:

by Anton Chuvakin (anton@chuvakin.org) at August 01, 2016 04:31 PM

Raymii.org

Nitrokey HSM/SmartCard-HSM and Raspberry Pi web cluster

This article sets up a Nitrokey HSM/SmartCard-HSM web cluster and has a lot of benchmarks. This specific HSM is not a fast HSM since it's very inexpensive and targeted at secure key storage, not performance. But, what if you do want more performance? Then you scale horizontally, just add some more HSM's and a loadbalancer in front. The cluster consists of Raspberry Pi's and Nitrokey HSM's and SmartCard-HSM's, softwarewise we use Apache, `mod_nss` and haproxy. We benchmark a small HTML file and a Wordpress site, with a regular 4096 bit RSA certificate without using the HSM's, a regular 2048 bit RSA certificate without using the HSM's, a 2048 bit RSA certificate in the HSM, a 1024 bit RSA certificate in the HSM and an EC prime256v1 key in the HSM. We do these benchmarks with the `OpenSC` module and with the `sc-hsm-embedded` module to see if that makes any difference.

August 01, 2016 12:00 AM

July 31, 2016

Chris Siebenmann

I've become mostly indifferent to what language something is written in

In a comment on this entry, Opk wrote (in part):

Will be interesting to hear what you make of git-series. I saw the announcement and somewhat lost interest when I saw that it needed rust to build.

I absolutely get this reaction to git-series being written in Rust, and to some extent I share it. I have my language biases and certainly a program being written in some languages will make me turn my nose up at it even if it sounds attractive, and in the past I was generally strongly dubious about things written in new or strange languages. However, these days I've mostly given up (and given in) on this, in large part because I've become lazy.

What I really care about these days is how much of a hassle a program is going to be to deal with. It's nice if a program is written in a language that I like, or at least one that I'm willing to look through to figure things out, but it's much more important that the program not be a pain in the rear to install and to operate. And these days, many language environments have become quite good at not being pains in the rear.

(The best case is when everything is already packaged for my OSes. Next best is when at least the basic language stuff is packaged and everything else has nice command line tools and can be used as a non-privileged user, so 'go get ...' and the like just work and don't demand to spray things all over system directories. The worst case is manual installation of things and things that absolutely demand to be installed in system directories; those get shown the exit right away.)

In short, I will (in theory) accept programs written in quite a lot of languages if all I have to do to deal with them is the equivalent of '<cmd> install whatever', maybe with a prequel of a simple install of the base language. I don't entirely enjoy having a $HOME/.<some-dir> populated by a pile of Python or Ruby or Perl or Rust or whatever artifacts that were dragged on to my system in order to make this magically work, but these days 'out of sight, out of mind'.

There are language environments that remain hassles and I'm unlikely to touch; the JVM is my poster child here. Languages that I've never had to deal with before add at least the disincentive of uncertainty; if I try out their package system and so on, will it work or will it blow up in my face and waste my time? As a result, although I'm theoretically willing to consider something written in node.js or Haskell or the like, in practice I don't think I've checked out any such programs. Someday something will sound sufficiently attractive to overcome my biases, but not today.

(As mentioned, I generally don't care if the program is available as a prebuilt package for my OS, because at that point there's almost no hassle; I just do 'apt-get install' or 'dnf install' and it's done. The one stumbling block is if I do 'dnf install' and suddenly three pages of dependent packages show up. That can make me decide I don't want to check out your program that badly.)

In the specific case of git-series and Rust, Rust by itself is not quite in this 'no hassle' zone just yet, at least on Fedora; if I had nothing else that wanted Rust, I probably would have ruled out actively looking into git-series as a result. But I'd already eaten the cost of building Rust and getting it working in order to keep being able to build Firefox, and thus at the start of the whole experience adding Cargo so I'd be able to use git-series seemed simple enough.

(Also, I can see the writing on the wall. People are going to keep on writing more and more interesting things in Rust, so sooner or later I was going to give in. It was just a question of when. I could have waited until Rust and Cargo made it into Fedora, but in practice I'm impatient and I sometimes enjoy fiddling around with this sort of thing.)

This casual indifference to the programming language things are using sort of offends my remaining purist insticts, but I'm a pragmatist these days. Laziness has trumped pickiness.

by cks at July 31, 2016 11:48 PM

(Not) changing the stop timeout for systemd login session scopes

I wrote earlier about an irritating systemd reboot behavior, where systemd may twiddle its thumbs for a minute and a half before killing some still-running user processes and actually rebooting your machine. In that entry I suggested that changing DefaultTimeoutStopSec in /etc/systemd/user.conf (or in a file in user.conf.d) would fix this, and then reversed myself in an update. That still leaves us with the question: how do you change this for user scopes?

The answer is that you basically can't. As far as I can tell, there is no way in systemd to change TimeoutStopSec for just user scopes (or just some user scopes). You can set DefaultTimeoutStopSec in /etc/systemd/system.conf (or in a file in system.conf.d), but then it affects everything instead of just user scopes.

(It's possible that you're okay with this; you may be of the view that when you say 'shut the machine down', you want the machine down within a relatively short period even if some daemons are being balky. And aggressively killing such balky processes is certainly traditional Unix shutdown behavior; systemd is being really quite laid back here. This is an appealing argument, but I haven't had the courage to put it into practice on my machines.)

As you can see in systemctl status's little CGroup layout chart, your user processes are in the following inheritance tree:

user.slice -> user-NNN.slice -> session-N.scope

You can create systemd override files for user.slice or even user-NNN.slice, but they can't contain either DefaultTimeoutStopSec or TimeoutStopSec directives. The former must go in a [Manager] section and the latter must go in a [Service] one, and neither section is accepted in slice units. There is a 'user@NNN.service' that is set up as part of logging in, and since it's a .service unit you can set a TimeoutStopSec in systemd override files for it (either for all users or for a specific UID), but it isn't used for very much and what you set for it doesn't affect your session scope, which is where we need to get the timeout value set.

If you want to use a really blunt hammer you can set KillUserProcesses in /etc/systemd/logind.conf (or in a logind.conf.d file). However, this has one definite and one probable drawback. The definite drawback is that this kneecaps screen, tmux, and any other way of keeping processes running when you're not logged in, unless you always remember to use the workarounds (and you realize that you need them in each particular circumstance).

(I don't know what it does to things like web server CGIs run as you via suexec or processes started under your UID as part of delivering mail. Probably all of those processes count as part of the relevant service.)

The probable drawback is that I suspect systemd does this process killing in the same way it does it for reboots, which means that the default 90 second timeout applies. So if you're logging in with processes that will linger, you log out, and then you immediately try to reboot the machine, you're still going to be waiting almost the entire timeout interval.

by cks at July 31, 2016 04:33 AM

July 29, 2016

Slaptijack

TACACS Detected 'Invalid Argument'

As always, I've changed pertinent details for reasons.

I was working on an ASR the other day and received the follow error:

RP/0/RSP0/CPU0:ASR9K(config-tacacs-host)# commit
Fri Jul 29 12:55:46.243 PDT

% Failed to commit one or more configuration items during a pseudo-atomic
operation. All changes made have been reverted. Please issue 'show configuration
failed [inheritance]' from this session to view the errors
RP/0/RSP0/CPU0:ASR9K(config-tacacs-host)# show configuration failed
Fri Jul 29 12:55:55.421 PDT
!! SEMANTIC ERRORS: This configuration was rejected by
!! the system due to semantic errors. The individual
!! errors with each failed configuration command can be
!! found below.

tacacs-server host 10.0.0.2 port 49
!!% 'TACACS' detected the 'fatal' condition 'Invalid Argument'
!
end

The problem here is that the tacacs daemon thinks the configuration contains an invalid argument. It doesn't. So, restart tacacs:

RP/0/RSP0/CPU0:ASR9K# show proc | inc tacacs
Fri Jul 29 12:56:32.376 PDT
1142   1    2  108K  16 Sigwaitinfo 7399:06:34:0893    0:00:00:0109 tacacsd
1142   2    0  108K  10 Receive     7399:06:35:0099    0:00:00:0000 tacacsd
1142   3    2  108K  10 Nanosleep      0:00:05:0940    0:00:00:0057 tacacsd
1142   4    1  108K  10 Receive     7399:06:34:0957    0:00:00:0000 tacacsd
1142   5    1  108K  10 Receive        0:00:00:0664    0:00:41:0447 tacacsd
1142   6    1  108K  10 Receive     2057:20:44:0638    0:00:44:0805 tacacsd
1142   7    2  108K  10 Receive     1167:26:53:0781    0:01:02:0991 tacacsd
1142   8    3  108K  10 Receive     1167:26:51:0567    0:01:29:0541 tacacsd
1142   9    2  108K  10 Receive      403:35:55:0206    0:01:09:0700 tacacsd
RP/0/RSP0/CPU0:ASR9K# process restart tacacsd
Fri Jul 29 12:56:54.768 PDT
RP/0/RSP0/CPU0:ASR9K# show proc | inc tacacs
Fri Jul 29 12:56:58.455 PDT
1142   1    3   64K  16 Sigwaitinfo    0:00:03:0806    0:00:00:0069 tacacsd
1142   2    1   64K  10 Receive        0:00:03:0998    0:00:00:0000 tacacsd
1142   3    3   64K  10 Nanosleep      0:00:03:0977    0:00:00:0000 tacacsd
1142   4    1   64K  10 Receive        0:00:03:0867    0:00:00:0002 tacacsd
1142   5    3   64K  10 Receive        0:00:03:0818    0:00:00:0000 tacacsd
1142   6    2   64K  16 Receive        0:00:03:0818    0:00:00:0000 tacacsd
1142   7    1   64K  16 Receive        0:00:03:0818    0:00:00:0000 tacacsd
1142   8    3   64K  16 Receive        0:00:03:0818    0:00:00:0000 tacacsd
1142   9    3   64K  10 Receive        0:00:00:0673    0:00:00:0003 tacacsd

And try again:

RP/0/RSP0/CPU0:ASR9K# config t
Fri Jul 29 12:57:04.787 PDT
RP/0/RSP0/CPU0:ASR9K(config)# tacacs-server host 10.0.0.2 port 49
RP/0/RSP0/CPU0:ASR9K(config-tacacs-host)# commit
Fri Jul 29 12:57:20.627 PDT
RP/0/RSP0/CPU0:ASR9K(config-tacacs-host)#

by Scott Hebert at July 29, 2016 01:00 PM

July 28, 2016

Electricmonk.nl

Ansible-cmdb v1.15: Generate a host overview of Ansible facts.

I've just released ansible-cmdb v1.15. Ansible-cmdb takes the output of Ansible's fact gathering and converts it into a static HTML overview page containing system configuration information. It supports multiple templates and extending information gathered by Ansible with custom data.

This release includes the following bugfixes and feature improvements:

  • Improvements to the resilience against wrong, unsupported and missing data.
  • SQL template. Generates SQL for use with SQLite and MySQL.
  • Minor bugfixes.

As always, packages are available for Debian, Ubuntu, Redhat, Centos and other systems. Get the new release from the Github releases page.

by admin at July 28, 2016 04:39 PM

R.I.Pienaar

A look at the Puppet 4 Application Orchestration feature

Puppet 4 got some new language constructs that let you model multi node applications and it assist with passing information between nodes for you. I recently wrote a open source orchestrator for this stuff which is part of my Choria suite, figured I’ll write up a bit about these multi node applications since they are now useable in open source.

The basic problem this feature solves is about passing details between modules. Lets say you have a LAMP stack, you’re going to have Web Apps that need access to a DB and that DB will have a IP, User, Password etc. Exported resources never worked for this because it’s just stupid to have your DB exporting web resources, there are unlimited amount of web apps and configs, no DB module support this. So something new is needed.

The problem is made worse by the fact that Puppet does not really have something like a Java interface, you can’t say that foo-mysql module implements a standard interface called database so that you can swap out one mysql module for another, they’re all snowflakes. So a intermediate translation layer is needed.

In a way you can say this new feature brings a way to create an interface – lets say SQL – and allows you to hook random modules into both sides of the interface. On one end a database and on the other a webapp. Puppet will then transfer the information across the interface for you, feeding the web app with knowledge of port, user, hosts etc.

LAMP Stack Walkthrough


Lets walk through creating a standard definition for a multi node LAMP stack, and we’ll create 2 instances of the entire stack. It will involve 4 machines sharing data and duties.

These interfaces are called capabilities, here’s an example of a SQL one:

Puppet::Type.newtype :sql, :is_capability => true do
  newparam :name, :is_namevar => true
  newparam :user
  newparam :password
  newparam :host
  newparam :database
end

This is a generic interface to a database, you can imagine Postgres or MySQL etc can all satisfy this interface, perhaps you could add here a field to confer the type of database, but lets keep it simple. The capability provides a translation layer between 2 unrelated modules.

It’s a pretty big deal conceptually, I can see down the line there be some blessed official capabilities and we’ll see forge modules starting to declare their compatibility. And finally we can get to a world of interchangeable infrastructure modules.

Now I’ll create a defined type to make my database for my LAMP stack app, I’m just going to stick a notify in instead of the actual creating of a database to keep it easy to demo:

define lamp::mysql (
  $db_user,
  $db_password,
  $host     = $::hostname,
  $database = $name,
) {
  notify{"creating mysql db ${database} on ${host} for user ${db_user}": }
}

I need to tell Puppet this defined type exist to satisfy the producing side of the interface, there’s some new language syntax to do this, it feels kind of out of place not having a logical file to stick this in, I just put it in my lamp/manifests/mysql.pp:

Lamp::Mysql produces Sql {
  user     => $db_user,
  password => $db_password,
  host     => $host,
  database => $database,
}

Here you can see the mapping from the variables in the defined type to those in the capability above. $db_user feeds into the capability property $user etc.

With this in place if you have a lamp::mysql or one based on some other database, you can always query it’s properties based on the standard user etc, more on that below.

So we have a database, and we want to hook a web app onto it, again for this we use a defined type and again just using notifies to show the data flow:

define lamp::webapp (
  $db_user,
  $db_password,
  $db_host,
  $db_name,
  $docroot = '/var/www/html'
) {
  notify{"creating web app ${name} with db ${db_user}@${db_host}/${db_name}": }
}

As this is the other end of the translation layer enabled by the capability we tell Puppet that this defined type consumes a Sql capability:

Lamp::Webapp consumes Sql {
  db_user     => $user,
  db_password => $password,
  db_host     => $host,
  db_name     => $database,
}

This tells Puppet to read the value of user from the capability and stick it into db_user of the defined type. Thus we can plumb arbitrary modules found on the forge together with a translation layer between their properties!

So you have a data producer and a data consumer that communicates across a translation layer called a capability.

The final piece of the puzzle that defines our LAMP application stack is again some new language features:

application lamp (
  String $db_user,
  String $db_password,
  Integer $web_instances = 1
) {
  lamp::mysql { $name:
    db_user     => $db_user,
    db_password => $db_password,
    export      => Sql[$name],
  }
 
  range(1, $web_instances).each |$instance| {
    lamp::webapp {"${name}-${instance}":
      consume => Sql[$name],
    }
  }
}

Pay particular attention to the application bit and export and consume meta parameters here. This tells the system to feed data from the above created translation layer between these defined types.

You should kind of think of the lamp::mysql and lamp::webapp as node roles, these define what an individual node will do in this stack. If I create this application and set $instances = 10 I will need 1 x database machine and 10 x web machines. You can cohabit some of these roles but I think that’s a anti pattern. And since these are different nodes – as in entirely different machines – the magic here is that the capability based data system will feed these variables from one node to the next without you having to create any specific data on your web instances.

Finally, like a traditional node we now have a site which defines a bunch of nodes and allocate resources to them.

site {
  lamp{'app2':
    db_user       => 'user2',
    db_password   => 'secr3t',
    web_instances => 3,
    nodes         => {
      Node['dev1.example.net'] => Lamp::Mysql['app2'],
      Node['dev2.example.net'] => Lamp::Webapp['app2-1'],
      Node['dev3.example.net'] => Lamp::Webapp['app2-2'],
      Node['dev4.example.net'] => Lamp::Webapp['app2-3']
    }
  }
 
  lamp{'app1':
    db_user       => 'user1',
    db_password   => 's3cret',
    web_instances => 3,
    nodes         => {
      Node['dev1.example.net'] => Lamp::Mysql['app1'],
      Node['dev2.example.net'] => Lamp::Webapp['app1-1'],
      Node['dev3.example.net'] => Lamp::Webapp['app1-2'],
      Node['dev4.example.net'] => Lamp::Webapp['app1-3']
    }
  }
}

Here we are creating two instances of the LAMP application stack, each with it’s own database and with 3 web servers assigned to the cluster.

You have to be super careful about this stuff, if I tried to put my Mysql for app1 on dev1 and the Mysql for app2 on dev2 this would basically just blow up, it would be a cyclic dependency across the nodes. You generally best avoid sharing nodes across many app stacks or if you do you need to really think this stuff through. It’s a pain.

You now have this giant multi node monolith with order problems not just inter resource but inter node too.

Deployment


Deploying these stacks with the abilities the system provide is pretty low tech. If you take a look at the site above you can infer dependencies. First we have to run dev1.example.net. It will both produce the data needed and install the needed infrastructure, and then we can run all the web nodes in any order or even at the same time.

There’s a problem though, traditionally Puppet runs every 30 minutes and gets a new catalog every 30 minutes. We can’t have these nodes randomly get catalogs in random order since there’s no giant network aware lock/ordering system. So Puppet now has a new model, nodes are supposed to run cached catalogs for ever and only get a new catalog when specifically told so. You tell it to deploy this stack and once deployed Puppet goes into a remediation cycle fixing the stack as it is with an unchanging catalog. If you want to change code, you again have to run this entire stack in this specific order.

This is a nice improvement for release management and knowing your state, but without tooling to manage this process it’s a fail, and today that tooling is embryonic and PE only.

So Choria which I released in Beta yesterday provides at least some relief, it brings a manual orchestrator for these things so you can kick of a app deploy on demand, later maybe some daemon will do this regularly I don’t know yet.

Lets take a look at Choria interacting with the above manifests, lets just show the plan:

This shows all the defined stacks in your site and group them in terms of what can run in parallel and in what order.

Lets deploy the stack, Choria is used again and it uses MCollective to do the runs using the normal Puppet agent, it tries to avoid humans interfering with a stack deploy by disabling Puppet and enabling Puppet at various stages etc:

It has options to limit the runs to a certain node batch size so you don’t nuke your entire site at once etc.

Lets look at some of the logs and notifies:

07:46:53 dev1.example.net> puppet-agent[27756]: creating mysql db app2 on dev1 for user user2
07:46:53 dev1.example.net> puppet-agent[27756]: creating mysql db app1 on dev1 for user user1
 
07:47:57 dev4.example.net> puppet-agent[27607]: creating web app app2-3 with db user2@dev1/app2
07:47:57 dev4.example.net> puppet-agent[27607]: creating web app app1-3 with db user1@dev1/app1
 
07:47:58 dev2.example.net> puppet-agent[23728]: creating web app app2-1 with db user2@dev1/app2
07:47:58 dev2.example.net> puppet-agent[23728]: creating web app app1-1 with db user1@dev1/app1
 
07:47:58 dev3.example.net> puppet-agent[23728]: creating web app app2-2 with db user2@dev1/app2
07:47:58 dev3.example.net> puppet-agent[23728]: creating web app app1-2 with db user1@dev1/app1

All our data flowed nicely through the capabilities and the stack was built with the right usernames and passwords etc. Timestamps reveal dev{2,3,4} ran concurrently thanks to MCollective.

Conclusion


To be honest, this whole feature feels like a early tech preview and not something that should be shipped. This is basically the plumbing a user friendly feature should be written on and that’s not happened yet. You can see from above it’s super complex – and you even have to write some pure ruby stuff, wow.

If you wanted to figure this out from the docs, forget about it, the docs are a complete mess, I found a guide in the Learning VM which turned out to be the best resource showing a actual complete walk through. This is sadly par for the course with Puppet docs these days 🙁

There’s some more features here – you can make cross node monitor checks to confirm the DB is actually up before attempting to start the web server for example, interesting. But implementing new checks is just such a chore – I can do it, I doubt your average user will be bothered, just make it so we can run Nagios checks, there’s 1000s of these already written and we all have them and trust them. Tbf, I could probably write a generic nagios checker myself for this, I doubt average user can.

The way nodes depend on each other and are ordered is of course obvious. It should be this way and these are actual dependencies. But at the same time this is stages done large. Stages failed because they make this giant meta dependency layered over your catalog and a failure in any one stage results in skipping entire other, later, stages. They’re a pain in the arse, hard to debug and hard to reason about. This feature implements the exact same model but across nodes. Worse there does not seem to be a way to do cross node notifies of resources. It’s as horrible.

That said though with how this works as a graph across nodes it’s the only actual option. This outcome should have been enough to dissuade the graph approach from even being taken though and something new should have been done, alas. It’s a very constrained system, it demos well but building infrastructure with this is going to be a losing battle.

The site manifest has no data capabilities. You can’t really use hiera/lookup there in any sane way. This is unbelievable, I know there were general lack of caring for external data at Puppet but this is like being back in Puppet 0.22 days before even extlookup existed and about as usable. It’s unbelievable that there’s no features for programatic node assignment to roles for example etc, though given how easy it is to make cycles and impossible scenarios I can see why. I know this is something being worked on though. External data is first class. External data modelling has to inform everything you do. No serious user uses Puppet without external data. It has to be a day 1 concern.

The underlying Puppet Server infrastructure that builds these catalogs is ok, I guess, the catalog is very hard to consume and everyone who want to write a thing to interact with it will have to write some terrible sorting/ordering logic themselves – and probably all have their own set of interesting bugs. Hopefully one day there will be a gem or something, or just a better catalog format. Worse it seems to happily compile and issue cyclic graphs without error, filed a ticket for that.

The biggest problem for me is that this is in the worst place of intersection between PE and OSS Puppet, it is hard/impossible to find out roadmap, or activity on this feature set since it’s all behind private Jira tickets. Sometimes some bubble up and become public, but generally it’s a black hole.

Long story short, I think it’s just best avoided in general until it becomes more mature and more visible what is happening. The technical issues are fine, it’s a new feature that’s basically new R&D, this stuff happens. The softer issues makes it a tough one to consider using.

by R.I. Pienaar at July 28, 2016 08:16 AM

HolisticInfoSec.org

Toolsmith Release Advisory: Windows Management Framework (WMF) 5.1 Preview

Windows Management Framework (WMF) 5.1 Preview has been released to the Download Center.
"WMF provides users with the ability to update previous releases of Windows Server and Windows Client to the management platform elements released in the most current version of Windows. This enables a consistent management platform to be used across the organization, and eases adoption of the latest Windows release."
As posted to the Window PowerShell Blog and reprinted here:
WMF 5.1 Preview includes the PowerShell, WMI, WinRM, and Software Inventory and Licensing (SIL) components that are being released with Windows Server 2016. 
WMF 5.1 can be installed on Windows 7, Windows 8.1, Windows Server 2008 R2, 2012, and 2012 R2, and provides a number of improvements over WMF 5.0 RTM including:
  • New cmdlets: local users and groups; Get-ComputerInfo
  • PowerShellGet improvements include enforcing signed modules, and installing JEA modules
  • PackageManagement added support for Containers, CBS Setup, EXE-based setup, CAB packages
  • Debugging improvements for DSC and PowerShell classes
  • Security enhancements including enforcement of catalog-signed modules coming from the Pull Server and when using PowerShellGet cmdlets
  • Responses to a number of user requests and issues
Detailed information on all the new WMF 5.1 features and updates, along with installation instructions, are in the WMF 5.1 release notes.
Please note:
  • WMF 5.1 Preview requires the .Net Framework 4.6, which must be installed separately. Instructions are available in the WMF 5.1 Release Notes Install and Configure topic.
  • WMF 5.1 Preview is intended to provide early information about what is in the release, and to give you the opportunity to provide feedback to the PowerShell team, but is not supported for production deployments at this time.
  • WMF 5.1 Preview may be installed directly over WMF 5.0.
  • It is a known issue that WMF 4.0 is currently required in order to install WMF 5.1 Preview on Windows 7 and Windows Server 2008. This requirement is expected to be removed before the final release.
  • Installing future versions of WMF 5.1, including the RTM version, will require uninstalling the WMF 5.1 Preview.


by Russ McRee (noreply@blogger.com) at July 28, 2016 04:25 AM

July 27, 2016

Errata Security

NYTimes vs. DNCleaks

People keep citing this New York Times article by David Sanger that attributes the DNCleaks to Russia. As I've written before, this is propaganda, not journalism. It's against basic journalistic ethics to quote anonymous "federal officials" in a story like this. The Society of Professional Journalists repudiates this [1] [2]. The NYTime's own ombudsman has itself criticized David Sanger for this practice, and written guidelines to specifically ban it.

Quoting anonymous federal officials is great, when they disagree with government, when revealing government malfeasance, when it's something that people will get fired over.

But the opposite is happening here. It's either Obama himself or some faction within the administration that wants us to believe Russia is involved. They want us to believe the propaganda, then hide behind anonymity so we can't question them. This evades obvious questions, like whether all their information comes from the same public sources that already point to Russia, or whether they have their own information from the CIA or NSA that points to Russia.

Everyone knows the Washington press works this way, and that David Sanger in particular is a journalistic whore. The NetFlix series House of Cards portrays this accurately in its first season, only "Zoe Barnes" is "David Sanger". In exchange for exclusive access to stories, the politician gets to plant propaganda when it suits his purpose.

All this NYTimes article by Sanger tells us is that some faction within the administration wants us to believe this, not whether it's true. That's not surprising. There are lots of war hawks that would want us to believe this. There are also lots who support Hillary over Trump -- who want us to believe that electing Trump plays into Putin's hands. Of course David Sanger would write such a story quoting anonymous sources, like he does after every such incident. You can pretty much write the story yourself.

Thus, we should fully discount Sanger's story. If government officials are willing to come forward an be named, and be held accountable for the information, then we should place more faith in them. As long as a faithless journalists protects them with anonymity, we shouldn't believe anything they say.

by Robert Graham (noreply@blogger.com) at July 27, 2016 07:09 PM

R.I.Pienaar

Fixing the mcollective deployment story

Getting started with MCollective has always been an adventure, you have to learn a ton of new stuff like Middleware etc. And once you get that going the docs tend to present you with a vast array of options and choices including such arcane topics like which security plugin to use while the security model chosen is entirely unique to mcollective. To get a true feeling for the horror see the official deployment guide.

This is not really a pleasant experience and probably results in many insecure or half build deployments out there – and most people just not bothering. This is of course entirely my fault, too many options with bad defaults chosen is to blame.

I saw the graph of the learning curve of Eve Online and immediately always think of mcollective 🙂 Hint: mcollective is not the WoW of orchestration tools.

I am in the process of moving my machines to Puppet 4 and the old deployment methods for MCollective just did not work, everything is falling apart under the neglect the project has been experiencing. You can’t even install any plugin packages on Debian as they will nuke your entire Puppet install etc.

So I figured why not take a stab at rethinking this whole thing and see what I can do, today I’ll present the outcome of that – a new Beta distribution of MCollective tailored to the Puppet 4 AIO packaging that’s very easy to get going securely.

Overview


My main goals with these plugins were that they share as much security infrastructure with Puppet as possible. This means we get a understandable model and do not need to mess around with custom CAs and certs and so forth. Focussing on AIO Puppet means I can have sane defaults that works for everyone out of the box with very limited config. The deployment guide should be a single short page.

For a new user who has never used MCollective and now need certificates there should be no need to write a crazy ~/.mcollective file and configure a ton of SSL stuff, they should only need to do:

$ mco choria request_cert

This will make a CSR, submit it to the PuppetCA and wait for it to be signed like Puppet Agent. Once signed they can immediately start using MCollective. No config needed. No certs to distribute. Secure by default. Works with the full AAA stack by default.

Sites may wish to have tighter than default security around what actions can be made, and deploying these policies should be trivial.

Introducing Choria


Choria is a suite of plugins developed specifically with the Puppet AIO user in mind. It rewards using Puppet as designed with defaults and can yield a near zero configuration setup. It combines with a new mcollective module used to configure AIO based MCollective.

The deployment guide for a Choria based MCollective is a single short page. The result is:

  • A Security Plugin that uses the Puppet CA
  • A connector for NATS
  • A discovery cache that queries PuppetDB using the new PQL language
  • A open source Application Orchestrator for the new Puppet Multi Node Application stuff (naming is apparently still hard)
  • Puppet Agent, Package Agent, Service Agent, File Manager Agent all setup and ready to use
  • SSL and TLS used everywhere, any packet that leaves a node is secure. This cannot be turned off
  • A new packager that produce Puppet Modules for your agents etc and supports every OS AIO Puppet does
  • The full Authentication, Authorization and Auditing stack set up out of the box, with default secure settings
  • Deployment scenarios works by default, extensive support for SRV records and light weight manual configuration for those with custom needs

It’s easy to configure using the new lookup system and gives you a full, secure, usable, mcollective out of the box with minimal choices to make.

You can read how to deploy it at it’s deployment guide.

Status


This is really a Beta release at the moment, I’m looking for testers and feedback. I am particularly interested in feedback on NATS and the basic deployment model, in future I might give the current connectors a same treatment with chosen defaults etc.

The internals of the security plugin is quite interesting, it proposes a new internal message structure for MCollective which should be much easier to support in other languages and is more formalised – to be clear these messages always existed, they were just a bit adhoc.

Additionally it’s the first quality security plugin that has specific support for building a quality web stack compatible MCollective REST server that’s AAA compatible and would even allow centralised RBAC and signature authority.

by R.I. Pienaar at July 27, 2016 02:35 PM

Raymii.org

Raspberry Pi unattended upgrade Raspbian to Debian Testing

I'm working on a Nitrokey/SmartCard-HSM cluster article and therefore I needed three identical computers. The current version of Raspbian (2016-05-27) is based on Debian Jessie and comes with a version of OpenSC that is too old (0.14) to work with the Nitrokey/SmartCard-HSM. Since there is no Ubuntu 16.04 official image yet I decided to upgrade Raspbian to Debian Testing. Since I don't want to answer yes to any config file changes or service restarts I figured out how to do an unattended dist-upgrade.

July 27, 2016 12:00 AM

July 23, 2016

Homo-Adminus

Highster mobile android review on phonetrack-reviews

He was using his single dad status to get women, and also lying to get women.It also lets you to replace the unwanted message with a new message.Until the last version for the roku came out.The husband has not been alert to the dangers which his wife faces.Additionally, during divorce settlements and child custody cases, evidence against a cheating wife or husband can help the other spouse to win a better settlement.You should not have to go through this alone.Indicates the success of fluid therapy by showing enhanced convective rbc flow.Why did you choose this http://phonetrack-reviews.com/highster-mobile-review/ High Ranking Spy Software for Android-based Gadgets. Upgrade to see the number of monthly visits from mobile users.It just seems like two people married and happy together, but with respect for themselves both staying the people they were, are going to live the longest marriages.My son was arrested for domestic violence involving his wife.Nbsp; is there a need for future workers in this field?Professionally designed site is very crucial to the success of any online business.According to a court record.Remember: there is no adequate substitution for a personal consultation with your physician.

by Oleksiy Kovyrin at July 23, 2016 08:49 PM

Steve Kemp's Blog

A final post about the lua-editor.

I recently mentioned that I'd forked Antirez's editor and added lua to it.

I've been working on it, on and off, for the past week or two now. It's finally reached a point where I'm content:

  • The undo-support is improved.
  • It has buffers, such that you can open multiple files and switch between them.
    • This allows this to work "kilua *.txt", for example.
  • The syntax-highlighting is improved.
    • We can now change the size of TAB-characters.
    • We can now enable/disable highlighting of trailing whitespace.
  • The default configuration-file is now embedded in the body of the editor, so you can run it portably.
  • The keyboard input is better, allowing multi-character bindings.
    • The following are possible, for example ^C, M-!, ^X^C, etc.

Most of the obvious things I use in Emacs are present, such as the ability to customize the status-bar (right now it shows the cursor position, the number of characters, the number of words, etc, etc).

Anyway I'll stop talking about it now :)

July 23, 2016 08:29 AM

Feeding the Cloud

Replacing a failed RAID drive

Here's the complete procedure I followed to replace a failed drive from a RAID array on a Debian machine.

Replace the failed drive

After seeing that /dev/sdb had been kicked out of my RAID array, I used smartmontools to identify the serial number of the drive to pull out:

smartctl -a /dev/sdb

Armed with this information, I shutdown the computer, pulled the bad drive out and put the new blank one in.

Initialize the new drive

After booting with the new blank drive in, I copied the partition table using parted.

First, I took a look at what the partition table looks like on the good drive:

$ parted /dev/sda
unit s
print

and created a new empty one on the replacement drive:

$ parted /dev/sdb
unit s
mktable gpt

then I ran mkpart for all 4 partitions and made them all the same size as the matching ones on /dev/sda.

Finally, I ran toggle 1 bios_grub (boot partition) and toggle X raid (where X is the partition number) for all RAID partitions, before verifying using print that the two partition tables were now the same.

Resync/recreate the RAID arrays

To sync the data from the good drive (/dev/sda) to the replacement one (/dev/sdb), I ran the following on my RAID1 partitions:

mdadm /dev/md0 -a /dev/sdb2
mdadm /dev/md2 -a /dev/sdb4

and kept an eye on the status of this sync using:

watch -n 2 cat /proc/mdstat

In order to speed up the sync, I used the following trick:

blockdev --setra 65536 "/dev/md0"
blockdev --setra 65536 "/dev/md2"
echo 300000 > /proc/sys/dev/raid/speed_limit_min
echo 1000000 > /proc/sys/dev/raid/speed_limit_max

Then, I recreated my RAID0 swap partition like this:

mdadm /dev/md1 --create --level=0 --raid-devices=2 /dev/sda3 /dev/sdb3
mkswap /dev/md1

Because the swap partition is brand new (you can't restore a RAID0, you need to re-create it), I had to update two things:

  • replace the UUID for the swap mount in /etc/fstab, with the one returned by mkswap (or running blkid and looking for /dev/md1)
  • replace the UUID for /dev/md1 in /etc/mdadm/mdadm.conf with the one returned for /dev/md1 by mdadm --detail --scan

Ensuring that I can boot with the replacement drive

In order to be able to boot from both drives, I reinstalled the grub boot loader onto the replacement drive:

grub-install /dev/sdb

before rebooting with both drives to first make sure that my new config works.

Then I booted without /dev/sda to make sure that everything would be fine should that drive fail and leave me with just the new one (/dev/sdb).

This test obviously gets the two drives out of sync, so I rebooted with both drives plugged in and then had to re-add /dev/sda to the RAID1 arrays:

mdadm /dev/md0 -a /dev/sda2
mdadm /dev/md2 -a /dev/sda4

Once that finished, I rebooted again with both drives plugged in to confirm that everything is fine:

cat /proc/mdstat

Then I ran a full SMART test over the new replacement drive:

smartctl -t long /dev/sdb

July 23, 2016 05:00 AM

Simon Lyall

Gather Conference 2016 – Afternoon

The Gathering

Chloe Swarbrick

  • Whose responsibility is it to disrupt the system?
  • Maybe try and engage with the system we have for a start before writing it off.
  • You disrupt the system yourself or you hold the system accountable

Nick McFarlane

  • He wrote a book
  • Rock Stars are dicks to work with

So you want to Start a Business

  • Hosted by Reuben and Justin (the accountant)
  • Things you need to know in your first year of business
  • How serious is the business, what sort of structure
    • If you are serious, you have to do things properly
    • Have you got paying customers yet
    • Could just be an idea or a hobby
  • Sole Trader vs Incorporated company vs Trust vs Partnership
  • Incorperated
    • Directors and Shareholders needed to be decided on
    • Can take just half an hour
  • when to get a GST number?
    • If over $60k turnover a year
    • If you have lots of stuff you plan to claim back.
  • Have an accounting System from Day 1 – Xero Pretty good
  • Get an advisor or mentor that is not emotionally invested in your company
  • If partnership then split up responsibilities so you can hold each other accountable for specific items
  • If you are using Xero then your accountant should be using Xero directly not copying it into a different system.
  • Remuneration
    • Should have a shareholders agreement
    • PAYE possibility from drawings or put 30% aside
    • Even if only a small hobby company you will need to declare income to IRD especially non-trivial level.
  • What Level to start at Xero?
    • Probably from the start if the business is intended to be serious
    • A bit of pain to switch over later
  • Don’t forget about ACC
  • Remember you are due provisional tax once you get over the the $2500 for the previous year.
  • Home Office expense claim – claim percentage of home rent, power etc
  • Get in professionals to help

Diversity in Tech

  • Diversity is important
    • Why is it important?
    • Does it mean the same for everyone
  • Have people with different “ways of thinking” then we will have a diverse views then wider and better solutions
  • example “Polish engineer could analysis a Polish specific character input error”
  • example “Controlling a robot in Samoan”, robots are not just in english
  • Stereotypes for some groups to specific jobs, eg “Indians in tech support”
  • Example: All hires went though University of Auckland so had done the same courses etc
  • How do you fix it when people innocently hire everyone from the same background? How do you break the pattern? No be the first different-hire represent everybody in that group?
  • I didn’t want to be a trail-blazer
  • Wow’ed out at “Women in tech” event, first time saw “majority of people are like me” in a bar.
  • “If he is a white male and I’m going to hire him on the team that is already full of white men he better be exception”
  • Worried about implication that “diversity” vs “Meritocracy” and that diverse candidates are not as good
  • Usual over-representation of white-males in the discussion even in topics like this.
  • Notion that somebody was only hired to represent diversity is very harmful especially for that person
  • If you are hiring for a tech position then 90% of your candidates will be white-males, try place your diversity in getting more diverse group applying for the jobs not tilt in the actual hiring.
  • Even in maker spaces where anyone is welcome, there are a lot fewer women. Blames mens mags having things unfinished, women’s mags everything is perfect so women don’t want to show off something that is unfinished.
  • Need to make the workforce diverse now to match the younger people coming into it
  • Need to cover “power income” people who are not exposed to tech
  • Even a small number are role models for the future for the young people today
  • Also need to address the problem of women dropping out of tech in the 30s and 40s. We can’t push girls into an “environment filled with acid”
  • Example taking out “cocky arrogant males” from classes into “advanced stream” and the remaining class saw women graduating and staying in at a much higher rate.

Podcasting

  • Paul Spain from Podcast New Zealand organising
  • Easiest to listen to when doing manual stuff or in car or bus
  • Need to avoid overload of commercials, eg interview people from the company about the topic of interest rather than about their product
  • Big firms putting money into podcasting
  • In the US 21% of the market are listening every single month. In NZ perhaps more like 5% since not a lot of awareness or local content
  • Some radios shows are re-cutting and publishing them
  • Not a good directory of NZ podcasts
  • Advise people use proper equipment if possible if more than a once-off. Bad sound quality is very noticeable.
  • One person: 5 part series on immigration and immigrants in NZ
  • Making the charts is a big exposure
  • Apples “new and noteworthy” list
  • Domination by traditional personalities and existing broadcasters at present. But that only helps traction within New Zealand

 

 

FacebookGoogle+Share

by simon at July 23, 2016 03:41 AM

Errata Security

My Raspeberry Pi cluster

So I accidentally ordered too many Raspberry Pi's. Therefore, I built a small cluster out of them. I thought I'd write up a parts list for others wanting to build a cluster.

To start with is some pics of the cluster What you see is a stack of 7 RPis. At the bottom of the stack is a USB multiport charger and also an Ethernet hub. You see USB cables coming out of the charger to power the RPis, and out the other side you see Ethernet cables connecting the RPis to a network. I've including the mouse and keyboard in the picture to give you a sense of perspective.


Here is the same stack turn around, seeing it from the other side. Out the bottom left you see three external cables, one Ethernet to my main network and power cables for the USB charger and Ethernet hub. You can see that the USB hub is nicely tied down to the frame, but that the Ethernet hub is just sort jammed in there somehow.




The concept is to get things as cheap as possible, on per unit basis. Otherwise, one might as well just buy more expensive computers. My parts list for a 7x Pi cluster are:

$35.00/unit Raspberry Pi
 $6.50/unit stacking case from Amazon
 $5.99/unit micro SD flash from Newegg
 $4.30/unit power supply from Amazon
 $1.41/unit Ethernet hub from Newegg
 $0.89/unit 6 inch and 1-foot micro USB cable from Monoprice
 $0.57/unit 1 foot Ethernet cable from Monoprice

...or $54.65 per unit (or $383 for entire cluster), or around 50% more than the base Raspberry Pis alone. This is getting a bit expensive, as Newegg. always has cheap Android tablets on closeout for $30 to $50.

So here's a discussion of the parts.

Raspberry Pi 2

These are old boards I'd ordered a while back. They are up to RPi3 now with slightly faster processors and WiFi/Bluetooth on board, neither of which are useful for a cluster. It has four CPUs each running at 900 MHz as opposed to the RPi3 which has four 1.2 GHz processors. If you order a Raspberry Pi now, it'll be the newer, better one.

The case

You'll notice that the RPi's are mounted on acrylic sheets, which are in turn held together with standoffs/spaces. This is a relatively expensive option.

A cheaper solution would be just to buy the spaces/standoffs yourself. They are a little hard to find, because the screws need to fit the 2.9mm holes, where are unusually tiny. Such spaces/standoffs are usually made of brass, but you can also find nylon ones. For the ends, you need some washers and screws. This will bring the price down to about $2/unit -- or a lot cheaper if you are buying in bulk for a lot of units.

The micro-SD

The absolute cheapest micro SD's I could find were $2.95/unit for 4gb, or half the price than the ones I bought. But the ones I chose are 4x the size and 2x the speed. RPi distros are getting large enough that they no longer fit well on 4gig cards, and are even approaching 8gigs. Thus, 16gigs are the best choice, especially when I could get hen for $6/unit. By the time you read this, the price of flash will have changed up or down. I search on Newegg, because that's the easiest way to focus on the cheapest. Most cards should work, but check http://elinux.org/RPi_SD_cards to avoid any known bad chips.

Note that different cards have different speeds, which can have a major impact on performance. You probably don't care for a cluster, but if you are buying a card for a development system, get the faster ones. The Samsung EVO cards are a good choice for something fast.

USB Charging Hub

What we want here is a charger not a hub. Both can work, but the charger works better.

A normal hub is about connecting all your USB devices to your desktop/laptop. That doesn't work for this RPi -- the connector is just for power. It's just leveraging the fact that there's already lots of USB power cables/chargers out there, so that it doesn't have to invite a custom one.

USB hubs an supply some power to the RPi, enough to boot it. However, under load, or when you connect further USB devices to the RPi, there may not be enough power available. You might be able to run a couple RPis from a normal hub, but when you've got all seven running (as in this stack), there might not be enough power. Power problems can outright crash the devices, but worse, it can lead to things like corrupt writes to the flash drives, slowly corrupting the system until it fails.

Luckily, in the last couple years we've seen suppliers of multiport chargers. These are designed for families (and workplaces) that have a lot of phones and tablets to charge. They can charge high-capacity batteries on all ports -- supplying much more power than your RPi will ever need.

If want to go ultra cheaper, then cheap hubs at $1/port may be adequate. Chargers cost around $4/port.

The charger I chose in particular is the Bolse 60W 7-port charger. I only need exactly 7 ports. More ports would be nicer, in case I needed to power something else along with the stack, but this Bolse unit has the nice property that it fits snugly within the stack. The frame came with extra spacers which I could screw together to provide room. I then used zip ties to hold it firmly in place.

Ethernet hub

The RPis only have 100mbps Ethernet. Therefore, you don't need a gigabit hub, which you'd normally get, but can choose a 100mbps hub instead: it's cheaper, smaller, and lower power. The downside is that while each RPi only does 100-mbps, combined they will do 700-mbps, which the hub can't handle.

I got a $10 hub from Newegg. As you can see, it fits within the frame, though not well. Every gigabit hub I've seen is bigger and could not fit this way.

Note that I have a couple extra RPis, but I only built a 7-high stack, because of the Ethernet hub. Hubs have only 8 ports, one of which is needed for the uplink. That leaves 7 devices. I'd have to upgrade to an unwieldy 16-port hub if I wanted more ports, which wouldn't fit the nice clean case I've got.

For a gigabit option, Ethernet switches will cost between $23 and $35 dollars. That $35 option is a "smart" switch that supports not only gigabit, but also a web-based configuration tool, VLANs, and some other high-end features. If I paid more for a switch, I'd probably go with the smart/managed one.

Cables (Ethernet, USB)

Buying cables is expensive, as everyone knows whose bought an Apple cable for $30. But buying in bulk from specialty sellers can reduce the price to under $1/cable.

The chief buy factor is length. We want short cables that will just barely be long enough. in the pictures above, the Ethernet cables are 1-foot, as are two of the USB cables. The colored USB cables are 6-inches. I got these off Amazon because they looked cool, but now I'm regretting it.

The easiest, cheapest, and highest quality place to buy cables is Monoprice.com. It allows you to easily select the length and color.

To reach everything in this stack, you'll need 1-foot cables. Though, 6-inch cables will work for some (but not all) of the USB devices. Although, instead of putting the hubs on the bottom, I could've put them in the middle of the stack, then 6-inch cables would've worked better -- but I didn't think that'd look as pretty. (I chose these colored cables because somebody suggested them, but they won't work for the full seven-high tower).

Power consumption


The power consumption of the entire stack is 13.3 watts while it's idle. The Ethernet hub by itself was 1.3 watts (so low because it's 100-mbps instead of gigabit).

So, round it up, that's 2-watts per RPi while idle.

In previous power tests, it's an extra 2 to 3 watts while doing heavy computations, so for the entire stack, that can start consuming a significant amount of power. I mention this because people think terms of a low-power alternative to Intel's big CPUs, but in truth, once you've gotten enough RPis in a cluster to equal the computational power of an Intel processor, you'll probably be consuming more electricity.

The operating system

I grabbed the lasted Raspbian image and installed it on one of the RPis. I then removed it, copied the files off (cp -a), reformatted it to use the f2fs flash file system, then copied the files back on. I then made an image of the card (using dd), then wrote that image to 6 other cards. I then I logged into each one ad renamed them rpi-a1, ..., rpi-a7. (Security note: this means they all have the same SSH private key, but I don't care).

About flash file systems

The micro SD flash has a bit of wear leveling, but not enough. A lot of RPi servers I've installed in the past have failed after a few months with corrupt drives. I don't know why, I suspect it's because the flash is getting corrupted.

Thus, I installed f2fs, a wear leveling file system designed especially for this sort of situation. We'll see if that helps at all.

One big thing is to make sure atime is disabled, a massively brain dead feature inherited from 1980s Unix that writes to the disk every time you read from a file.

I notice that the green LED on the RPi, indicating disk activity, flashes very briefly once per second, (so quick you'll miss it unless you look closely at the light). I used iotop -a to find out what it is. I think it's just a hardware feature and not related to disk activity. On the other hand, it's worth tracking down what writes might be happening in the background that will affect flash lifetime.

What I found was that there is some kernel thread that writes rarely to the disk, and a "f2fs garbage collector" that's cleaning up the disk for wear leveling. I saw nothing that looked like it was writing regularly to the disk.


What to use it for?

So here's the thing about an RPi cluster -- it's technically useless. If you run the numbers, it's got less compute power and higher power consumption than a normal desktop/laptop computer. Thus, an entire cluster of them will still perform slower than laptops/desktops.

Thus, the point of a cluster is to have something to play with, to experiment with, not that it's the best form of computation. The point of individual RPis is not that they have better performance/watt -- but that you don't need as much performance but want a package with very low watts.

With that said, I should do some password cracking benchmarks with them, compared across CPUs and GPUs, measuring power consumption. That'll be a topic for a later post.

With that said, I will be using these, though as individual computers rather than as a "cluster". There's lots of services I want to run, but I don't want to run a full desktop running VMware. I'd rather control individual devices.

Conclusion

I'm not sure what I'm going to do with my little RPi stack/cluster, but I wanted to document everything about it so that others can replicate it if they want to.

by Robert Graham (noreply@blogger.com) at July 23, 2016 02:24 AM

July 22, 2016

Simon Lyall

Gather Conference 2016 – Morning

At the Gather Conference again for about the 6th time. It is a 1-day tech-orientated unconference held in Auckland every year.

The day is split into seven streamed sessions each 40 minutes long (of about 8 parallel rooms of events that are each scheduled and run by attendees) plus and opening and a keynote session.

How to Steer your own career – Shirley Tricker

  • Asked people hands up on their current job situation, FT vs PT, sinmgle v multiple jobs
  • Alternatives to traditional careers of work. possible to craft your career
  • Recommended Blog – Free Range Humans
  • Job vs Career
    • Job – something you do for somebody else
    • Career – Uniqie to you, your life’s work
    • Career – What you do to make a contribution
  • Predicted that a greater number of people will not stay with one (or even 2 or 3) employers through their career
  • Success – defined by your goals, lifestyle wishes
  • What are your strengths – Know how you are valuable, what you can offer people/employers, ways you can branch out
  • Hard and Soft Skills (soft skills defined broadly, things outside a regular job description)
  • Develop soft skills
    • List skills and review ways to develop and improve them
    • Look at people you admire and copy them
    • Look at job desctions
  • Skills you might need for a portfilio career
    • Good at organising, marketing, networking
    • flexible, work alone, negotiation
    • Financial literacy (handle your accounts)
  • Getting started
    • Start small ( don’t give up your day job overnight)
    • Get training via work or independently
    • Develop you strengths
    • Fix weaknesses
    • Small experiments
    • cheap and fast (start a blog)
    • Don’t have to start out as an expert, you can learn as you go
  • Just because you are in control doesn’t make it easy
  • Resources
    • Careers.govt.nz
    • Seth Goden
    • Tim Ferris
    • eg outsources her writing.
  • Tools
    • Xero
    • WordPress
    • Canva for images
    • Meetup
    • Odesk and other freelance websites
  • Feedback from Audience
    • Have somebody to report to, eg meet with friend/adviser monthly to chat and bounce stuff off
    • Cultivate Women’s mentoring group
    • This doesn’t seem to filter through to young people, they feel they have to pick a career at 18 and go to university to prep for that.
    • Give advice to people and this helps you define
    • Try and make the world a better place: enjoy the work you are doing, be happy and proud of the outcome of what you are doing and be happy that it is making the world a bit better
    • How to I “motivate myself” without a push from your employer?
      • Do something that you really want to do so you won’t need external motivation
      • Find someone who is doing something write and see what they did
      • Awesome for introverts
    • If you want to start a startup then work for one to see what it is like and learn skills
    • You don’t have to have a startup in your 20s, you can learn your skills first.
    • Sometimes you have to do a crappy job at the start to get onto the cool stuff later. You have to look at the goal or path sometimes

Books and Podcasts – Tanya Johnson

Stuff people recommend

  • Intelligent disobedience – Ira
  • Hamilton the revolution – based on the musical
  • Never Split the difference – Chris Voss (ex hostage negotiator)
  • The Three Body Problem – Lia CiXin – Sci Fi series
  • Lucky Peach – Food and fiction
  • Unlimited Memory
  • The Black Swan and Fooled by Randomness
  • The Setup (usesthis.com) website
  • Tim Ferris Podcast
  • Freakonomics Podcast
  • Moonwalking with Einstein
  • Clothes, Music, Boy – Viv Albertine
  • TIP: Amazon Whispersync for Kindle App (audiobook across various platforms)
  • TIP: Blinkist – 15 minute summaries of books
  • An Intimate History of Humanity – Theodore Zenden
  • How to Live – Sarah Bakewell
  • TIP: Pocketcasts is a good podcast app for Android.
  • Tested Podcast from Mythbusters people
  • Trumpcast podcast from Slate
  • A Fighting Chance – Elizabeth Warren
  • The Choice – Og Mandino
  • The Good life project Podcast
  • The Ted Radio Hour Podcast (on 1.5 speed)
  • This American Life
  • How to be a Woman by Caitlin Moran
  • The Hard thing about Hard things books
  • Flashboys
  • The Changelog Podcast – Interview people doing Open Source software
  • The Art of Oppertunity Roseland Zander
  • Red Rising Trilogy by Piers Brown
  • On the Rag podcast by the Spinoff
  • Hamish and Andy podcast
  • Radiolab podcast
  • Hardcore History podcast
  • Car Talk podcast
  • Ametora – Story of Japanese menswear since WW2
  • .net rocks podcast
  • How not to be wrong
  • Savage Love Podcast
  • Friday Night Comedy from the BBC (especially the News Quiz)
  • Answer me this Podcast
  • Back to work podcast
  • Reply All podcast
  • The Moth
  • Serial
  • American Blood
  • The Productivity podcast
  • Keeping it 1600
  • Ruby Rogues Podcast
  • Game Change – John Heilemann
  • The Road less Travelled – M Scott Peck
  • The Power of Now
  • Snow Crash – Neil Stevensen

My Journey to becoming a Change Agent – Suki Xiao

  • Start of 2015 was a policy adviser at Ministry
  • Didn’t feel connected to job and people making policies for
  • Outside of work was a Youthline counsellor
  • Wanted to make a difference, organised some internal talks
  • Wanted to make changes, got told had to be a manager to make changes (10 years away)
  • Found out about R9 accelerator. Startup accelerator looking at Govt/Business interaction and pain points
  • Get seconded to it
  • First month was very hard.
  • Speed of change was difficult, “Lean into the discomfort” – Team motto
  • Be married to the problem
    • Specific problem was making sure enough seasonal workers, came up with solution but customers didn’t like it. Was not solving the actual problem customers had.
    • Team was married to the problem, not the married to the solution
  • When went back to old job, found slower pace hard to adjust back
  • Got offered a job back at the accelerator, coaching up to 7 teams.
    • Very hard work, lots of work, burnt out
    • 50% pay cut
    • Worked out wasn’t “Agile” herself
    • Started doing personal Kanban boards
    • Cut back number of teams coaching, higher quality
  • Spring Board
    • Place can work at sustainable pace
    • Working at Nomad 8 as an independent Agile consultant
    • Work on separate companies but some support from colleges
  • Find my place
    • Joined Xero as a Agile Team Facilitator
  • Takeaways
    • Anybody can be a change agent
    • An environment that supports and empowers
    • Look for support
  • Conversation on how you overcome the “Everest” big huge goal
    • Hard to get past the first step for some – speaker found she tended to do first think later. Others over-thought beforehand
    • It seems hard but think of the hard things you have done in your life and it is usually not as bad
    • Motivate yourself by having no money and having no choice
    • Point all the bad things out in the open, visualise them all and feel better cause they will rarely happen
    • Learn to recognise your bad patterns of thoughts
    • “The Way of Art” Steven Pressfield (skip the Angels chapter)
  • Are places Serious about Agile instead of just placing lip-service?
    • Questioner was older and found places wanted younger Agile coaches
    • Companies had to completely change into organisation, eg replace project managers
    • eg CEO is still waterfall but people lower down are into Agile. Not enough management buy-in.
    • Speaker left on client that wasn’t serious about changing
  • Went though an Agile process, made “Putting Agile into the Org” as the product
  • Show customers what the value is
  • Certification advice, all sorts of options. Nomad8 course is recomended

 

FacebookGoogle+Share

by simon at July 22, 2016 10:09 PM

Racker Hacker

Setting up a telnet handler for OpenStack Zuul CI jobs in GNOME 3

The OpenStack Zuul system has gone through some big changes recently, and one of those changes is around how you monitor a running CI job. I work on OpenStack-Ansible quite often, and the gate jobs can take almost an hour to complete at times. It can be helpful to watch the output of a Zuul job to catch a problem or follow a breakpoint.

New Zuul

In the previous version of Zuul, you could access the Jenkins server that was running the CI job and monitor its progress right in your browser. Today, you can monitor the progress of a job via telnet. It’s much easier to use and it’s a lighter-weight way to review a bunch of text.

Some of you might be saying: “It’s 2016. Telnet? Unencrypted? Seriously?”

Before you get out the pitchforks, all of the data is read-only in the telnet session, and nothing sensitive is transmitted. Anything that comes through the telnet session is content that exists in an open source repository within OpenStack. If someone steals the output of the job, they’re not getting anything valuable.

I was having a lot of trouble figuring out how to set up a handler for telnet:// URL’s that I clicked in Chrome or Firefox. If I clicked a link in Chrome, it would be passed off to xdg-open. I’d press OK on the window and then nothing happened.

Creating a script

First off, I needed a script that would take the URL coming from an application and actually do something with it. The script will receive a URL as an argument that looks like telnet://SERVER_ADDRESS:PORT and that must be handed off to the telnet executable. Here’s my basic script:

#!/bin/bash

# Remove the telnet:// and change the colon before the port
# number to a space.
TELNET_STRING=$(echo $1 | sed -e 's/telnet:\/\///' -e 's/:/ /')

# Telnet to the remote session
/usr/bin/telnet $TELNET_STRING

# Don't close out the terminal unless we are done
read -p "Press a key to exit"

I saved that in ~/bin/telnet.sh. A quick test with localhost should verify that the script works:

$ chmod +x ~/bin/telnet.sh
$ ~/bin/telnet.sh telnet://127.0.0.1:12345
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
Press a key to exit

Linking up with GNOME

We need a .desktop file so that GNOME knows how to run our script. Save a file like this to ~/.local/share/applications/telnet.desktop:

[Desktop Entry]
Version=1.0
Name=Telnet
GenericName=Telnet
Comment=Telnet Client
Exec=/home/major/bin/telnet.sh %U
Terminal=true
Type=Application
Categories=TerminalEmulator;Network;Telnet;Internet;BBS;
MimeType=x-scheme/telnet
X-KDE-Protocols=telnet
Keywords=Terminal;Emulator;Network;Internet;BBS;Telnet;Client;

Change the path in Exec to match where you placed your script.

We need to tell GNOME how to handle the x-scheme-handler/telnet mime type. We do that with xdg utilities:

$ xdg-mime default telnet.desktop x-scheme-handler/telnet
$ xdg-mime query default x-scheme-handler/telnet
telnet.desktop

Awesome! When you click a link in Chrome, the following should happen:

  • Chrome will realize it has no built-in handler and will hand off to xdg-open
  • xdg-open will check its list of mime types for a telnet handler
  • xdg-open will parse telnet.desktop and run the command in the Exec line within a terminal
  • Our telnet.sh script runs with the telnet:// URI provided as an argument
  • The remote telnet session is connected

The post Setting up a telnet handler for OpenStack Zuul CI jobs in GNOME 3 appeared first on major.io.

by Major Hayden at July 22, 2016 07:44 PM

What’s Happening in OpenStack-Ansible (WHOA) – July 2016

OpenStackThis post is the second installment in the series of What’s Happening in OpenStack-Ansible (WHOA) posts that I’m assembling each month. My goal is to inform more people about what we’re doing in the OpenStack-Ansible community and bring on more contributors to the project.

July brought lots of changes for the OpenStack-Ansible project and the remaining work for the Newton release is coming together well. Many of the changes made in the Newton branch have made deployments faster, more reliable and more repeatable.

Let’s get to the report!

New releases

You can always find out about the newest releases for most OpenStack projects on the OpenStack Development mailing list, but I’ll give you the shortest summary possible here.

Kilo

The final Kilo release, 11.2.17, is in the books! If you’re on Kilo, it’s definitely time to move forward.

Liberty

The latest Liberty release is now 12.1.0. For more information on what’s included, review the release notes or view a detailed changelog.

Mitaka

Mitaka is the latest stable branch and it’s currently at version 13.2.0. It contains lots of bug fixes and a few small backported features. The latest details are always in the release notes and the detailed changelog.

Notable discussions

The OpenStack-Ansible mid-cycle is quickly approaching! It runs from August 10th through the 12th at Rackspace’s headquarters in San Antonio, Texas. All of the signup information is on the etherpad along with the proposed agenda. If you’re interested in OpenStack deployment automation with Ansible, please feel free to join us!

Support for Open vSwitch is now in OpenStack-Ansible, along with Distributed Virtual Routing (DVR). Travis Truman wrote a blog post about using the new Open vSwitch support. The support for DVR was added very recently.

We had a good discussion around standardizing how OpenStack’s python services are deployed. Some projects are now recommending the use of uwsgi with their API services. During this week’s IRC meeting, we agreed as a group that the best option would be to standardize on uwsgi if possible during the Newton release. If that’s not possible, it should be done early in the Ocata release.

Jean-Philippe Evrard was nominated to be a core developer on OpenStack-Ansible and the thread received many positive comments over the week. Congratulations, JP!

Notable developments

Lots of work is underway in the Newton release to add support for new features, squash bugs, and reduce the time it takes to deploy a cloud.

Documentation

Documentation seems to go one of two ways with most projects:

  • Sufficient documentation that is organized poorly (OpenStack-Ansible’s current state)
  • Insufficient documentation that is organized well

One of the complaints I heard at the summit was “What the heck are we thinking with chapter four?”

To be fair, that chapter is gigantic. While it contains a myriad of useful information, advice, and configuration options, it’s overwhelming for beginners and even seasoned deployers.

Work is underway to overhaul the installation guide and provide a simple, easy-to-follow, opinionated method for deploying an OpenStack cloud. This would allow beginners to start on solid ground and have a straightforward deployment guide. The additional information and configuration options would still be available in the documentation, but the documentation will provide strong recommendations for the best possible options.

Gnocchi deployments

OpenStack-Ansible can now deploy Gnocchi. Gnocchi provides a time series database as a service and it’s handy for use with ceilometer, which stores a lot of time-based information.

Multiple RabbitMQ clusters

Some OpenStack services communicate very frequently with RabbitMQ and that can cause issues for some other services. OpenStack-Ansible now supports independent RabbitMQ clusters for certain services. This allows a deployer to use a different RabbitMQ cluster for handling telemetry traffic than they use for handling nova’s messages.

PowerVM support

Lots of changes were added to allow for multiple architecture support, which is required for full PowerVM support. Some additional fixes for higher I/O performance and OVS on Power support arrived as well.

Repo server improvements

Building the repo server takes quite a bit of time as repositories are cloned, wheels are built, and virtual environments are assembled. A series of patches merged into the project that aim to reduce the time to build a repo server.

Previously, the repo server built every possible virtual environment that could be needed for an OpenStack-Ansible deployment. Today, the repo server only builds virtual environments for those services that will be deployed. This saves time during the build process and a fair amount of disk space as well.

Source code is also kept on the repo server so that it won’t need to be downloaded again for multiple architecture builds.

Additional changes are on the way to only clone the necessary git repositories to the repo server.

Ubuntu 16.04 (Xenial) support

Almost all of the OpenStack-Ansible roles in Newton have Ubuntu 16.04 support and the integrated gate job is turning green a lot more often this week. We still need some testers who can do some real world multi-server deployments and shake out any bugs that don’t appear in an all-in-one (AIO) build.

Feedback?

The goal of this newsletter is three fold:

  • Keep OpenStack-Ansible developers updated with new changes
  • Inform operators about new features, fixes, and long-term goals
  • Bring more people into the OpenStack-Ansible community to share their use
    cases, bugs, and code

Please let me know if you spot any errors, areas for improvement, or items that I missed altogether. I’m mhayden on Freenode IRC and you can find me on Twitter anytime.

The post What’s Happening in OpenStack-Ansible (WHOA) – July 2016 appeared first on major.io.

by Major Hayden at July 22, 2016 03:48 PM

July 21, 2016

Cryptography Engineering

Statement on DMCA lawsuit

My name is Matthew Green. I am a professor of computer science and a researcher at Johns Hopkins University in Baltimore. I focus on computer security and applied cryptography.

Today I filed a lawsuit against the U.S. government, to strike down Section 1201 of the Digital Millennium Copyright Act. This law violates my First Amendment right to gather information and speak about an urgent matter of public concern: computer security. I am asking a federal judge to strike down key parts of this law so they cannot be enforced against me or anyone else.

A large portion of my work involves building and analyzing the digital security systems that make our modern technological world possible. These include security systems like the ones that protect your phone calls, instant messages, and financial transactions – as well as more important security mechanisms that safeguard property and even human life.

I focus a significant portion of my time on understanding the security systems that have been deployed by industry. In 2005, my team found serious flaws in the automotive anti-theft systems used in millions of Ford, Toyota and Nissan vehicles. More recently, my co-authors and I uncovered flaws in the encryption that powers nearly one third of the world’s websites, including Facebook and the National Security Agency. Along with my students, I've identified flaws in Apple’s iMessage text messaging system that could have allowed an eavesdropper to intercept your communications. And these are just a sampling of the public research projects I’ve been involved with.

I don’t do this work because I want to be difficult. Like most security researchers, the research I do is undertaken in good faith. When I find a flaw in a security system, my first step is to call the organization responsible. Then I help to get the flaw fixed. Such independent security research is an increasingly precious commodity. For every security researcher who investigates systems in order to fix them, there are several who do the opposite – and seek to profit from the insecurity of the computer systems our society depends on.

There’s a saying that no good deed goes unpunished. The person who said this should have been a security researcher. Instead of welcoming vulnerability reports, companies routinelythreaten good-faith security researchers with civil action, or even criminal prosecution. Companies use the courts to silence researchers who have embarrassing things to say about their products, or who uncover too many of those products' internal details. These attempts are all too often successful, in part because very few security researchers can afford a prolonged legal battle with well-funded corporate legal team.

This might just be a sad story about security researchers, except for the fact that these vulnerabilities affect everyone. When security researchers are intimidated, it’s the public that pays the price. This is because real criminals don’t care about lawsuits and intimidation – and they certainly won’t bother to notify the manufacturer. If good-faith researchers aren’t allowed to find and close these holes, then someone else will find them, walk through them, and abuse them.

In the United States, one of the most significant laws that blocks security researchers is Section 1201 of the Digital Millennium Copyright Act (DMCA). This 1998 copyright law instituted a raft of restrictions aimed at preventing the “circumvention of copyright protection systems.” Section 1201 provides both criminal and civil penalties for people who bypass technological measures protecting a copyrighted work. While that description might bring to mind the copy protection systems that protect a DVD or an iTunes song, the law has also been applied to prevent users from reverse-engineering software to figure out how it works. Such reverse-engineering is a necessary party of effective security research.

Section 1201 poses a major challenge for me as a security researcher. Nearly every attempt to analyze a software-based system presents a danger of running afoul of the law. As a result, the first step in any research project that involves a commercial system is never science – it’s to call a lawyer; to ask my graduate students to sign a legal retainer; and to inform them that even with the best legal advice, they still face the possibility of being sued and losing everything they have. This fear chills critical security research.

Section 1201 also affects the way that my research is conducted. In a recent project – conducted in Fall 2015 – we were forced to avoid reverse-engineering a piece of software when it would have been the fastest and most accurate way to answer a research question. Instead, we decided to treat the system as a black box, recovering its operation only by observing inputs and outputs. This approach often leads to a less perfect understanding of the system, which can greatly diminish the quality of security research. It also substantially increases the time and effort required to finish a project, which reduces the quantity of security research.

Finally, I have been luckier than most security researchers in that I have access to legal assistance from organizations such as the Electronic Frontier Foundation. Not every security researcher can benefit from this.

The risk imposed by Section 1201 and the heavy cost of steering clear of it discourage me – and other researchers -- from pursuing any project that does not appear to have an overwhelming probability of success. This means many projects that would yield important research and protect the public simply do not happen.

In 2015, I filed a request with the Library of Congress for a special exemption that would have exempted good faith security researchers from the limitations of Section 1201. Representatives of the major automobile manufacturers and the Business Software Alliance (a software industry trade group) vigorously opposed the request. This indicates to me that even reasonable good faith security testing is still a risky proposition.

This risk is particularly acute given that the exemption we eventually won was much more limited than what we asked for, and leaves out many of the technologies with the greatest impact on public health, privacy, and the security of financial transactions.

Section 1201 has prevented crucial security research for far too long. That’s why I’m seeking a court order that would strike Section 1201 from the books as a violation of the First Amendment. 

by Matthew Green (noreply@blogger.com) at July 21, 2016 06:34 PM

July 20, 2016

OpenSSL

FIPS 140-2: Once More Unto the Breach

The last post on this topic sounded a skeptical note on the prospects for a new FIPS 140 validated module for OpenSSL 1.1 and beyond. That post noted a rather improbable set of prerequisites for a new validation attempt; ones I thought only a governmental sponsor could meet (as was the case for the five previous open source based validations).

Multiple commercial vendors have offered to fund (very generously in some cases) a new validation effort under terms that would guarantee them a proprietary validation, while not guaranteeing an open source based validation. At one point we actually came close to closing a deal that would have funded an open source based validation attempt in exchange for a limited period of exclusivity; a reasonable trade-off in my opinion. But, I eventually concluded that was too risky given an uncertain reception by the FIPS validation bureaucracy, and we decided to wait for a “white knight” sponsor that might never materialize.

I’m pleased to announce that white knight has arrived; SafeLogic has committed to sponsor a new FIPS validation on “truly open or bust” terms that address the major risks that have prevented us from proceeding to date. SafeLogic is not only providing the critical funding for this effort; they will also play a significant role. The co-founders of SafeLogic, Ray Potter and Wes Higaki, wrote a book about the FIPS 140 validation process. The SafeLogic technical lead will be Mark Minnoch, who I worked with extensively when he was director of the accredited test lab that performed the open source based validations for the OpenSSL FIPS Object Module 2.0. The test lab for this effort will be Acumen Security. While I’ve not worked directly with Acumen before, I have corresponded with its director and co-founder, Ashit Vora, on several occasions and I know SafeLogic has chosen carefully. With my OpenSSL colleagues doing the coding as before, in particular Steve Henson and Andy Polyakov, we have a “dream team” for this sixth validation effort.

Note that this validation sponsorship is very unusual, and something most commercial companies would be completely incapable of even considering. SafeLogic is making a bold move, trusting in us and in the sometimes fickle and unpredictable FIPS validation process. Under the terms of this sponsorship OpenSSL retains full control and ownership of the FIPS module software and the validation. This is also an all-or-nothing proposition; no one – including SafeLogic – gets to use the new FIPS module until and if a new open source based validation is available for everyone. SafeLogic is making a major contribution to the entire OpenSSL user community, for which they have my profound gratitude.

Now, why would a commercial vendor like SafeLogic agree to such an apparently one sided deal? Your typical MBA would choke at the mere thought. But, SafeLogic has thought it through carefully; they “get” open source and they are already proficient at leveraging open source. This new OpenSSL FIPS module will become the basis of many new derivative products, even more so than the wildly popular 2.0 module, and no vendor is going to be closer to the action or more familiar with the nuances than SafeLogic. As an open source product the OpenSSL FIPS module with its business-friendly license will always be available to anyone for use in pursuing their own validation actions, but few vendors have much interest in pursuing such a specialized and treacherous process when better alternatives are available. Having sponsored and actively collaborated with the validation from the starting line, SafeLogic will be in the perfect position to be that better alternative.

There are a lot of moving parts to this plan – technical details of the new module, interim licensing, schedule aspirations, etc. – that I’ll try to cover in upcoming posts.

July 20, 2016 07:00 PM

R.I.Pienaar

Interacting with the Puppet CA from Ruby

I recently ran into a known bug with the puppet certificate generate command that made it useless to me for creating user certificates.

So I had to do the CSR dance from Ruby myself to work around it, it’s quite simple actually but as with all things in OpenSSL it’s weird and wonderful.

Since the Puppet Agent is written in Ruby and it can do this it means there’s a HTTP API somewhere, these are documented reasonably well – see /puppet-ca/v1/certificate_request/ and /puppet-ca/v1/certificate/. Not covered is how to make the CSRs and such.

First I have a little helper to make the HTTP client:

def ca_path; "/home/rip/.puppetlabs/etc/puppet/ssl/certs/ca.pem";end
def cert_path; "/home/rip/.puppetlabs/etc/puppet/ssl/certs/rip.pem";end
def key_path; "/home/rip/.puppetlabs/etc/puppet/ssl/private_keys/rip.pem";end
def csr_path; "/home/rip/.puppetlabs/etc/puppet/ssl/certificate_requests/rip.pem";end
def has_cert?; File.exist?(cert_path);end
def has_ca?; File.exist?(ca_path);end
def already_requested?;!has_cert? && File.exist?(key_path);end
 
def http
  http = Net::HTTP.new(@ca, 8140)
  http.use_ssl = true
 
  if has_ca?
    http.ca_file = ca_path
    http.verify_mode = OpenSSL::SSL::VERIFY_PEER
  else
    http.verify_mode = OpenSSL::SSL::VERIFY_NONE
  end
 
  http
end

This is a HTTPS client that uses full verification of the remote host if we have a CA. There’s a small chicken and egg where you have to ask the CA for it’s own certificate where it’s a unverified connection. If this is a problem you need to arrange to put the CA on the machine in a safe manner.

Lets fetch the CA:

def fetch_ca
  return true if has_ca?
 
  req = Net::HTTP::Get.new("/puppet-ca/v1/certificate/ca", "Content-Type" => "text/plain")
  resp, _ = http.request(req)
 
  if resp.code == "200"
    File.open(ca_path, "w", Ob0644) {|f| f.write(resp.body)}
    puts("Saved CA certificate to %s" % ca_path)
  else
    abort("Failed to fetch CA from %s: %s: %s" % [@ca, resp.code, resp.message])
  end
 
  has_ca?
end

At this point we have the CA and saved it, future requests will be verified against this CA. If you put the CA there using some other means this will do nothing.

Now we need to start making our CSR, first we have to make a private key, this is a 4096 bit key saved in pem format:

def write_key
  key = OpenSSL::PKey::RSA.new(4096)
  File.open(key_path, "w", Ob0640) {|f| f.write(key.to_pem)}
  key
end

And the CSR needs to be made using this key, Puppet CSRs are quite simple with few fields filled in, can’t see why you couldn’t fill in more fields and of course it now supports extensions, I didn’t add any of those here, just a OU:

def write_csr(key)
  csr = OpenSSL::X509::Request.new
  csr.version = 0
  csr.public_key = key.public_key
  csr.subject = OpenSSL::X509::Name.new(
    [
      ["CN", @certname, OpenSSL::ASN1::UTF8STRING],
      ["OU", "my org", OpenSSL::ASN1::UTF8STRING]
    ]
  )
  csr.sign(key, OpenSSL::Digest::SHA1.new)
 
  File.open(csr_path, "w", Ob0644) {|f| f.write(csr.to_pem)}
 
  csr.to_pem
end

Let’s combine these to make the key and CSR and send the request to the Puppet CA, this request is verified using the CA:

def request_cert
  req = Net::HTTP::Put.new("/puppet-ca/v1/certificate_request/%s?environment=production" % @certname, "Content-Type" => "text/plain")
  req.body = write_csr(write_key)
  resp, _ = http.request(req)
 
  if resp.code == "200"
    puts("Requested certificate %s from %s" % [@certname, @ca])
  else
    abort("Failed to request certificate from %s: %s: %s: %s" % [@ca, resp.code, resp.message, resp.body])
  end
end

You’ll now have to sign the cert on your Puppet CA as normal, or use autosign, nothing new here.

And finally you can attempt to fetch the cert, this method is designed to return false if the cert is not yet ready on the master – ie. not signed yet.

def attempt_fetch_cert
  return true if has_cert?
 
  req = Net::HTTP::Get.new("/puppet-ca/v1/certificate/%s" % @certname, "Content-Type" => "text/plain")
  resp, _ = http.request(req)
 
  if resp.code == "200"
    File.open(cert_path, "w", Ob0644) {|f| f.write(resp.body)}
    puts("Saved certificate to %s" % cert_path)
  end
 
  has_cert?
end

Pulling this all together you have some code to make keys, CSR etc, cache the CA and request a cert is signed, it will then do a wait for cert like Puppet does till things are signed.

def main
  abort("Already have a certificate '%s', cannot continue" % @certname) if has_cert?
 
  make_ssl_dirs
  fetch_ca
 
  if already_requested?
    puts("Certificate %s has already been requested, attempting to retrieve it" % @certname)
  else
    puts("Requesting certificate for '%s'" % @certname)
    request_cert
  end
 
  puts("Waiting up to 120 seconds for it to be signed")
  puts
 
  12.times do |time|
    print "Attempting to download certificate %s: %d / 12\r" % [@certname, time]
 
    break if attempt_fetch_cert
 
    sleep 10
  end
 
  abort("Could not fetch the certificate after 120 seconds") unless has_cert?
 
  puts("Certificate %s has been stored in %s" % [@certname, ssl_dir])
end

by R.I. Pienaar at July 20, 2016 01:45 PM

July 19, 2016

Everything Sysadmin

Survey on easy of use for configuration management languages

I'm reposting this survey. The researcher is trying to identify what makes configuration languages difficult to use. If you use Puppet/Chef/CfEngine/Ansible (and more importantly... if you DON'T use any configuration languages) please take this survey. I took it in under 10 minutes.

Tom


I would like to invite you to take a survey on configuration languages. The survey is about 10 minutes long and does not require any specific skills or knowledge, anyone can participate in it.

https://edinburgh.onlinesurveys.ac.uk/configlang

The survey is a part of ongoing research at University of Edinburgh School of Informatics on configuration languages. As Paul Anderson has mentioned previously, we are interested in studying the usability of configuration languages and how it relates to the configuration errors.

The survey will be open through the 2nd of August, 2016. It is possible to take survey in parts by choosing to finish it later, but it cannot be completed past the deadline.

Please feel free to forward this survey to others.

If you have any questions or concerns, contact information is linked on the first page of the survey.

Your answers to the survey will be used in aggregate to improve our understanding of how configuration languages are understood and used.

Thank you
Adele Mikoliunaite
Masters student at
University of Edinburgh

by Tom Limoncelli at July 19, 2016 06:00 PM

Racker Hacker

Join me on Thursday to talk about OpenStack LBaaS and security hardening

Rackspace Office Hours Live Stream header imageIf you want to learn more about load balancers and security hardening in OpenStack clouds, join me on Thursday for the Rackspace Office Hours podcast! Walter Bentley, Kenneth Hui and I will be talking about some of the new features available in the 12.2 release of Rackspace Private Cloud powered by OpenStack.

The release has a tech preview of OpenStack’s Load Balancer as a Service project. The new LBaaSv2 API is stable and makes it easy to create load balancers, add pools, and add members. Health monitors can watch over servers and remove those servers from the load balancers if they don’t respond properly.

OpenStack Summit Austin 2016 - Automated Security Hardening with OpenStack-Ansible - Major HaydenI talked about the security hardening feature extensively at this year’s OpenStack Summit in Austin and it is now available in the 12.2 release of RPC.

The new Ansible role and its tasks apply over 200 security hardening configurations to OpenStack hosts (control plane and hypervisors) and it comes with extensive auditor-friendly documentation. The documentation also allows deployers to fine-tune many of the configurations and disable the ones they don’t want. Deployers also have the option to tighten some configurations depending on their industry requirements.

Join us this Thursday, July 21st, at 1:00 PM CDT (check your time zone) to talk more about these new features and OpenStack in general.

The post Join me on Thursday to talk about OpenStack LBaaS and security hardening appeared first on major.io.

by Major Hayden at July 19, 2016 02:09 PM

[bc-log]

Using salt-ssh to install Salt

In recent articles I covered how I've built a Continuous Delivery pipeline for my blog. These articles talk about using Docker to build a container for my blog, using Travis CI to test and build that container, and finally using a Masterless SaltStack configuration to deploy the blog. Once setup, this pipeline enables me to publish new posts by simply managing them within a GitHub repository.

The nice thing about this setup is that not only are blog posts managed hands-free. All of the servers that host my blog are managed hands-free. That means required packages, services and configurations are all deployed with the same Masterless SaltStack configuration used for the blog application deployment.

The only thing that isn't hands-free about this setup, is installing and configuring the initial SaltStack Minion agent. That is, until today. In this article I am going to cover how to use salt-ssh, SaltStack's SSH functionality to install and configure the salt-minion package on a new server.

How salt-ssh works

A typical SaltStack deployment consists of a Master server running the salt-master process, and one or more Minion servers running the salt-minion process. The salt-minion service will initiate communication with the salt-master service over a ZeroMQ connection and the Master distributes desired states to the Minion.

With this typical setup, you must first have a salt-master service installed and configured, and every server that you wish to manage with Salt must have the salt-minion service installed and configured. The salt-ssh package however, changes that.

salt-ssh is designed to provide SaltStack with the ability to manage servers in an agent-less fashion. What that means is, salt-ssh gives you the ability to manage a server, without having to install and configure the salt-minion package.

Why use salt-ssh instead of the salt-minion agent?

Why would anyone not want to install the salt-minion package? Well, there are a lot of possible reasons. One reason that comes to mind is that some systems are so performance oriented that there may be a desire to avoid performance degradation by running an agent. While salt-minion doesn't normally use a lot of resources, it uses some and that some may be too much for certain environments.

Another reason may be due to network restrictions, for example if the Minions are in a DMZ segment of a network you may want to use salt-ssh from the master so that connections are only going from the master to the minion and never the minion to the master like the traditional setup.

For today's article, the reasoning behind using salt-ssh is that I wish to automate the installation and configuration of the salt-minion service on new servers. These servers will not have the salt-minion package installed by default and I wanted to automate the initial installation of Salt.

Getting started with salt-ssh

Before we can start using salt-ssh to setup our new server we will first need to setup a Master server where we can call salt-ssh from. For my environment I will be using a virtual machine running on my local laptop as the Salt Master. Since we are using salt-ssh there is no need for the Minions to connect to this Master which makes running it from a laptop a simple solution.

Installing SaltStack

On this Salt Master, we will need to install both the salt-master and the salt-ssh packages. Like previous articles we will be following SaltStack's official guide for installing Salt on Ubuntu systems.

Setup Apt

The official install guide for Ubuntu uses the Apt package manager to install SaltStack. In order to install these packages with Apt we will need to setup the SaltStack repository. We will do so using the add-apt-repository command.

$ sudo add-apt-repository ppa:saltstack/salt
     Salt, the remote execution and configuration management tool.
     More info: https://launchpad.net/~saltstack/+archive/ubuntu/salt
    Press [ENTER] to continue or ctrl-c to cancel adding it

    gpg: keyring `/tmp/tmpvi1b21hk/secring.gpg' created
    gpg: keyring `/tmp/tmpvi1b21hk/pubring.gpg' created
    gpg: requesting key 0E27C0A6 from hkp server keyserver.ubuntu.com
    gpg: /tmp/tmpvi1b21hk/trustdb.gpg: trustdb created
    gpg: key 0E27C0A6: public key "Launchpad PPA for Salt Stack" imported
    gpg: Total number processed: 1
    gpg:               imported: 1  (RSA: 1)
    OK

Once the Apt repository has been added we will need to refresh Apt's package cache with the apt-get update command.

$ sudo apt-get update
Get:1 http://security.ubuntu.com trusty-security InRelease [65.9 kB]
Ign http://ppa.launchpad.net trusty InRelease                                  
Ign http://archive.ubuntu.com trusty InRelease                                 
Get:2 http://archive.ubuntu.com trusty-updates InRelease [65.9 kB]
Get:3 http://ppa.launchpad.net trusty Release.gpg [316 B]                      
Get:4 http://ppa.launchpad.net trusty Release [15.1 kB]                        
Get:5 http://security.ubuntu.com trusty-security/main Sources [118 kB]         
Fetched 10.9 MB in 7s (1,528 kB/s)                                             
Reading package lists... Done

This update command causes Apt to look through it's known package repositories and refresh a local inventory of available packages.

Installing the salt-master and salt-ssh packages

Now that we have SaltStack's Apt repository configured we can proceed with installing the required Salt packages. We will do this with the apt-get install command specifying the two packages we wish to install.

$ sudo apt-get install salt-master salt-ssh
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following extra packages will be installed:
  git git-man liberror-perl libpgm-5.1-0 libzmq3 python-async python-croniter
  python-dateutil python-git python-gitdb python-jinja2 python-m2crypto
  python-markupsafe python-msgpack python-smmap python-zmq salt-common
Suggested packages:
  git-daemon-run git-daemon-sysvinit git-doc git-el git-email git-gui gitk
  gitweb git-arch git-bzr git-cvs git-mediawiki git-svn python-jinja2-doc
  salt-doc python-mako
The following NEW packages will be installed:
  git git-man liberror-perl libpgm-5.1-0 libzmq3 python-async python-croniter
  python-dateutil python-git python-gitdb python-jinja2 python-m2crypto
  python-markupsafe python-msgpack python-smmap python-zmq salt-common
  salt-master salt-ssh
0 upgraded, 19 newly installed, 0 to remove and 189 not upgraded.
Need to get 7,069 kB of archives.
After this operation, 38.7 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y

In addition to the salt-ssh and salt-master packages Apt will install any dependencies that these packages require. With these packages installed, we can now move on to configuring our Salt master.

Configuring salt-ssh to connect to our target minion

Before we can start using salt-ssh to manage our new minion server we will first need to tell salt-ssh how to connect to that server. We will do this by editing the /etc/salt/roster file.

$ sudo vi /etc/salt/roster

With a traditional SaltStack setup the minion agents would initiate the first connection to the Salt master. This first connection is the way the master service identifies new minion servers. With salt-ssh there is no process for Salt to automatically identify new minion servers. This is where the /etc/salt/roster file comes into the picture, as this file used as an inventory of minions for salt-ssh.

As with other configuration files in Salt, the roster file is a YAML formatted file which makes it fairly straight forwarder to understand. The information required to specify a new minion is also pretty straight forward.

blr1-001:
  host: 10.0.0.2
  user: root

The above information is fairly minimal, the basic definition is a Target name blr1-001, a Hostname or IP address specified by host and then the Username specified by user.

The target name is used when running the salt-ssh command to specify what minion we wish to target. The host key is used by salt-ssh to define where to connect to, and the user key is used to define who to connect as.

In the example above I specified to use the root user. It is possible to use salt-ssh with a non-root user by simply adding sudo: True to the minion entry.

Testing connectivity to our minion

With the minion now defined within the /etc/salt/roster file we should now be able to connect to our minion with salt-ssh. We can test this out by executing a test.ping task against this target with salt-ssh.

$ sudo salt-ssh 'blr1-001' --priv=/home/vagrant/.ssh/id_rsa test.ping
blr1-001:
    ----------
    retcode:
        254
    stderr:
    stdout:
        The host key needs to be accepted, to auto accept run salt-ssh with the -i flag:
        The authenticity of host '10.0.0.2 (10.0.0.2)' can't be established.
        ECDSA key fingerprint is 2c:34:0a:51:a2:bb:88:cc:3b:86:25:bc:b8:d0:b3:d0.
        Are you sure you want to continue connecting (yes/no)?

The salt-ssh command above has a similar syntax to the standard salt command called with a typical Master/Minion setup; the format is salt-ssh <target> <task>. In this case our target was blr1-001 the same name we defined earlier and our task was test.ping.

You may also notice that I passed the --priv flag followed by a path to my SSH private key. This flag is used to specify an SSH key to use when connecting to the minion server. By default, salt-ssh will use SaltStack's internal SSH key, which means if you wish to use an alternative key you will need to specify the key with the --priv flag.

In many cases it's perfectly fine to use SaltStack's internal SSH key, in my case the SSH public key has already been distributed which means I do not want to use Salt's internal SSH key.

Bypassing Host Key Validation

If we look at the output of the salt-ssh command executed earlier, we can see that the command was not successful. The reason for this is because this master server has not accepted the host key from the new minion server. We can get around this issue by specifying the -i flag when running salt-ssh.

$ sudo salt-ssh 'blr1-001' --priv /home/vagrant/.ssh/id_rsa -i test.ping
blr1-001:
    True

The -i flag tells salt-ssh to ignore host key checks from SSH. We can see from the above salt-ssh execution, when the -i flag is used, everything works as expected.

Specifying a password

In the salt-ssh commands above we used an SSH key for authentication with the Minion server. This worked because prior to setting up Salt, I deployed the public SSH key to the minion server we are connecting with. If we didn't want to use SSH keys for authentication with the Salt Minion for whatever reason, we could also use password based authentication by specifying the password within the roster file.

$ sudo vi /etc/salt/roster

To add a password for authentication, simply add the passwd key within the target servers specification.

blr1-001:
  host: 10.0.0.2
  user: root
  passwd: example

With the above definition salt-ssh will now connect to our minion by establishing an SSH connection to 10.0.0.2 and login to this system as the root user with the password of example. With a password defined, we can rerun our test.ping this time with the --priv flag omitted.

$ sudo salt-ssh 'blr1-001' -i test.ping
blr1-001:
    True

Now that a test.ping has returned correctly we can move on to our next step of defining the Salt states to install and configure the salt-minion package.

Using Salt to install Salt

In the Master-less Salt Minions article I had two GitHub repositories setup with various Salt state files. One repository contains salt state files that can be used to setup a generic base system, and the other contains custom state files used to setup the environment running this blog.

Breaking down the minion state

Within this second repository is a salt state file that installs and configures the salt-minion service. Let's take a quick look at this state to understand how the salt-minion package is being installed.

salt-minion:
  pkgrepo:
    - managed
    - humanname: SaltStack Repo
    - name: deb http://repo.saltstack.com/apt/ubuntu/14.04/amd64/latest {{ grains['lsb_distrib_codename'] }} main
    - dist: {{ grains['lsb_distrib_codename'] }}
    - key_url: https://repo.saltstack.com/apt/ubuntu/14.04/amd64/latest/SALTSTACK-GPG-KEY.pub
  pkg:
    - latest
  service:
    - dead
    - enable: False

/etc/salt/minion.d/masterless.conf:
  file.managed:
    - source: salt://salt/config/etc/salt/minion.d/masterless.conf

/etc/cron.d/salt-standalone:
  file.managed:
    - source: salt://salt/config/etc/cron.d/salt-standalone

In the above state we can see that the Salt Apt repository is being added with the pkgrep module. The salt-minion package is being installed with the pkg module and the salt-minion service is being disabled by the service module.

We can also see that two files /etc/salt/minion.d/masterless.conf and /etc/cron.d/salt-standalone are being deployed. For this article, we will use this state as is to perform the initial salt-minion installation and configuration.

Setting up the master

As with the previous article we will use both of these repositories to setup our minion server. We will get started by first using git to clone these repositories into a directory within /srv/salt.

For the first repository, we will clone the contents into a base directory.

$ sudo git clone https://github.com/madflojo/salt-base.git /srv/salt/base
Cloning into '/srv/salt/base'...
remote: Counting objects: 54, done.
remote: Total 54 (delta 0), reused 0 (delta 0), pack-reused 54
Unpacking objects: 100% (54/54), done.
Checking connectivity... done.

The second repository we will clone into /srv/salt/bencane.

$ sudo git clone https://github.com/madflojo/blog-salt.git /srv/salt/bencane
Cloning into '/srv/salt/bencane'...
remote: Counting objects: 46, done.
remote: Total 46 (delta 0), reused 0 (delta 0), pack-reused 46
Unpacking objects: 100% (46/46), done.
Checking connectivity... done.

With all of the salt states now on our local system we can configure Salt to use these state files. To do this we will need to edit the /etc/salt/master configuration file.

$ sudo vi /etc/salt/master

Within the master file the file_roots configuration parameter is used to define where Salt's state files are located on the master. Since we have two different locations for the two sets of state files we will specify them individually as their own item underneath file_roots.

file_roots:
  base:
    - /srv/salt/base
  bencane:
    - /srv/salt/bencane

With the above defined, we can now use our Salt states to setup a new minion server. To do this we will once again run salt-ssh but this time specifying the state.highstate task.

$ sudo salt-ssh 'blr1-001' -i state.highstate
----------
          ID: salt-minion
    Function: pkgrepo.managed
        Name: deb http://repo.saltstack.com/apt/ubuntu/14.04/amd64/latest trusty main
      Result: True
     Comment: Configured package repo 'deb http://repo.saltstack.com/apt/ubuntu/14.04/amd64/latest trusty main'
     Started: 21:18:54.462571
    Duration: 20440.965 ms
     Changes:   
              ----------
              repo:
                  deb http://repo.saltstack.com/apt/ubuntu/14.04/amd64/latest trusty main
----------
          ID: salt-minion
    Function: pkg.latest
      Result: True
     Comment: The following packages were successfully installed/upgraded: salt-minion
     Started: 21:19:14.903859
    Duration: 16889.713 ms
     Changes:   
              ----------
              dctrl-tools:
                  ----------
                  new:
                      2.23ubuntu1
                  old:
              salt-minion:
                  ----------
                  new:
                      2016.3.1+ds-1
                  old:
----------
          ID: salt-minion
    Function: service.dead
      Result: True
     Comment: Service salt-minion has been disabled, and is dead
     Started: 21:19:32.488449
    Duration: 133.722 ms
     Changes:   
              ----------
              salt-minion:
                  True
----------
          ID: /etc/salt/minion.d/masterless.conf
    Function: file.managed
      Result: True
     Comment: File /etc/salt/minion.d/masterless.conf updated
     Started: 21:19:32.626328
    Duration: 11.762 ms
     Changes:   
              ----------
              diff:
                  New file
              mode:
                  0644
----------
          ID: /etc/cron.d/salt-standalone
    Function: file.managed
      Result: True
     Comment: File /etc/cron.d/salt-standalone updated
     Started: 21:19:32.638297
    Duration: 4.049 ms
     Changes:   
              ----------
              diff:
                  New file
              mode:
                  0644

Summary
-------------
Succeeded: 37 (changed=23)
Failed:     0
-------------
Total states run:     37

From the results of the state.highstate task, we can see that 37 Salt states were verified and 23 of those resulted in changes being made. We can also see from the output of the command above that our salt-ssh execution just resulted in the installation and configuration of the salt-minion package.

If we wanted to verify that this is true even further we can use the cmd.run Salt module to execute the dpkg --list command.

$ sudo salt-ssh 'blr1-001' -i cmd.run "dpkg --list | grep salt"
blr1-001:
    ii  salt-common                        2016.3.1+ds-1                    all          shared libraries that salt requires for all packages
    ii  salt-minion                        2016.3.1+ds-1                    all          client package for salt, the distributed remote execution system

Summary

With the above, we can see that we were successfully able to install the salt-minion agent to a remote system via salt-ssh. While this may seem like quite a bit of work to setup for a single minion. The ability to install Salt with salt-ssh can be very useful when you are setting up multiple minions, as this same methodology works whether you're installing Salt on 1 or 1,000 minions.


Posted by Benjamin Cane

July 19, 2016 10:30 AM

July 17, 2016

pagetable

Why does PETSCII have upper case and lower case reversed?

The PETSCII character encoding that is used on the Commodore 64 (and all other Commodore 8 bit computers) is similar to ASCII, but different: Uppercase and lowercase are swapped! Why is this?

The "PET 2001" from 1977 had a built-in character set of 128 characters. This would have been enough for the 96 printable ASCII characters, and an extra 32 graphical characters. But Commodore decided to replace the 26 lowercase characters with even more graphical characters. After all, the PET did not support a bitmapped display, so the only way to display graphics was using the graphical characters built into the character set. These consisted of symbols useful for box drawing as well as miscellaneous symbols (including the french deck suits).

So the first PET was basically using ASCII, but with missing lower case. This had an influence on the character codes produced by the keyboard:

Key ASCII keyboard PET keyboard
A a ($61) A ($41)
Shift + A A ($41) ♠ ($C1)

On a standard ASCII system, pressing "A" unshifted produces a lower case "a", and together with shift, it produces an uppercase "A". On a PET, because there is no lower case, pressing "A" unshifted produces an uppercase "A", and together with shift produces the "spade" symbol ("♠") from the PET-specific graphical characters.

Later Commodore 8 bit computers added a second character set that supported uppercase and lowercase. Commodore decided to allow the user to switch between the two character sets at any time (by pressing the "Commodore" and "Shift" keys together), so applications generally didn't know which character set was active. Therefore, the keyboard had to produce the same character code independent of the current character set:

key ASCII keyboard PET keyboard (upper/graph) PET keyboard (upper/lower)
A a ($61) A ($41) a ($41)
Shift + A A ($41) ♠ ($C1) A ($C1)

An unshifted "A" still produces a code of $41, but it has to be displayed as a lower case "a" if the upper/lower character set is enabled, so Commodore had to put lower case characters at the $40-$5F area – which in ASCII are occupied by the uppercase characters. A shifted "A" still produces a code of $C1, so Commodore put the uppercase characters into the $C0-$DF area.

Now that both upper case and lower case were supported, Commodore decided to map the previously undefined $60-$7F area to upper case as well:

range ASCII upper case PETSCII lower case PETSCII lower case PETSCII with fallback
$40-$5F upper case upper case lower case lower case
$60-$7F lower case undefined undefined upper case
$C0-$DF undefined graphical upper case upper case

If you only look at the area between $00 and $7F, you can see that PETSCII reverses upper and lower case compared to ASCII, but the $60-$7F area is only a compatibility fallback; upper case is actually in the $C0-$DF area – it's what the keyboard driver produces.

by Michael Steil at July 17, 2016 08:26 PM

Raymii.org

Storing arbitraty data in the Nitrokey HSM/SmartCard-HSM with Elementary Files (EF)

This is a guide which shows you how to write small elementary files to a nitrokey HSM. This can be usefull if you want to securely store data protected by a user pin. You can enter the wrong pin only three times, so offline brute forcing is out of the picture.

July 17, 2016 12:00 AM

July 15, 2016

SysAdmin1138

Database schemas vs. identity

Yesterday brought this tweet up:

This is amazingly bad wording, and is the kind of thing that made the transpeople in my timeline (myself included) go "Buwhuh?" and me to wonder if this was a snopes worthy story.

No, actually.

The key phrase here is, "submit your prints for badges".

There are two things you should know:

  1. NASA works on National Security related things, which requires a security clearance to work on, and getting one of those requires submitting prints.
  2. The FBI is the US Government's authority in handling biometric data

Here is a chart from the Electronic Biometric Transmission Specification, which describes a kind of API for dealing with biometric data.

If Following Condition ExistsEnter Code
Subject's gender reported as femaleF
Occupation or charge indicated "Male Impersonator"G
Subject's gender reported as maleM
Occupation or charge indicated "Female Impersonator" or transvestiteN
Male name, no gender givenY
Female name, no gender givenZ
Unknown genderX

Source: EBTS Version 10.0 Final, page 118.

Yep, it really does use the term "Female Impersonator". To a transperson living in 2016 getting their first Federal job (even as a contractor), running into these very archaic terms is extremely off-putting.

As someone said in a private channel:

This looks like some 1960's bureaucrat trying to be 'inclusive'

This is not far from the truth.

This table exists unchanged in the 7.0 version of the document, dated January 1999. Previous versions are in physical binders somewhere, and not archived on the Internet; but the changelog for the V7 document indicates that this wording was in place as early as 1995. Mention is also made of being interoperable with UK law-enforcement.

The NIST standard for fingerprints issued in 1986 mentions a SEX field, but it only has M, F, and U; later NIST standards drop this field definition entirely.

As this field was defined in standard over 20 years ago and has not been changed, is used across the full breadth of the US justice system, is referenced in International communications standards including Visa travel, and used as the basis for US Military standards, these field definitions are effectively immutable and will only change after concerted effort over decades.

This is what institutionalized transphobia looks like, and we will be dealing with it for another decade or two. If not longer.


The way to deal with this is to deprecate the codes in documentation, but still allow them as valid.

The failure-mode of this comes in with form designers who look at the spec and build forms based on the spec. Like this example from Maryland. Which means we need to let the forms designers know that the spec needs to be selectively ignored.

At the local level, convince your local City Council to pass resolutions to modernize their Police forms to reflect modern sensibilities, and drop the G and N codes from intake forms. Do this at the County too, for the Sheriff's department.

At the state level, convince your local representatives to push resolutions to get the State Patrol to modernize their forms likewise. Drop the G and N codes from the forms.

At the Federal employee level, there is less to be done here as you're closer to the governing standards, but you may be able to convince The Powers That Be to drop the two offensive checkboxes or items from the drop-down list.

by SysAdmin1138 at July 15, 2016 03:17 PM

July 14, 2016

Steve Kemp's Blog

Adding lua to all the things!

Recently Antirez made a post documenting a simple editor in 1k of pure C, the post was interesting in itself, and the editor is a cute toy because it doesn't use curses - instead using escape sequences.

The github project became very popular and much interesting discussion took place on hacker news.

My interest was piqued because I've obviously spent a few months working on my own console based program, and so I had to read the code, see what I could learn, and generally have some fun.

As expected Salvatore's code is refreshingly simple, neat in some areas, terse in others, but always a pleasure to read.

Also, as expected, a number of forks appeared adding various features. I figured I could do the same, so I did the obvious thing in adding Lua scripting support to the project. In my fork the core of the editor is mostly left alone, instead code was moved out of it into an external lua script.

The highlight of my lua code is this magic:

  --
  -- Keymap of bound keys
  --
  local keymap = {}

  --
  --  Default bindings
  --
  keymap['^A']        = sol
  keymap['^D']        = function() insert( os.date() ) end
  keymap['^E']        = eol
  keymap['^H']        = delete
  keymap['^L']        = eval
  keymap['^M']        = function() insert("\n") end

I wrote a function invoked on every key-press, and use that to lookup key-bindings. By adding a bunch of primitives to export/manipulate the core of the editor from Lua I simplified the editor's core logic, and allowed interesting facilities:

  • Interactive evaluation of lua.
  • The ability to remap keys on the fly.
  • The ability to insert command output into the buffer.
  • The implementation of copy/past entirely in Lua_.

All in all I had fun, and I continue to think a Lua-scripted editor would be a neat project - I'm just not sure there's a "market" for another editor.

View my fork here, and see the sample kilo.lua config file.

July 14, 2016 08:57 PM

July 13, 2016

LZone - Sysadmin

Workaround OpenSSH 7.0 Problems

OpenSSH 7+ deprecates weak key exchange algorithm diffie-hellman-group1-sha1 and DSA public keys for both host and user keys which lead to the following error messages:
Unable to negotiate with 172.16.0.10 port 22: no matching key exchange method found. Their offer: diffie-hellman-group1-sha1
or a simple permission denied when using a user DSA public key or
Unable to negotiate with 127.0.0.1: no matching host key type found.
Their offer: ssh-dss
when connecting to a host with a DSA host key.

Workaround

Allow the different deprecated features in ~/.ssh/config
Host myserver
  # To make pub ssh-dss keys work again
  PubkeyAcceptedKeyTypes +ssh-dss

# To make host ssh-dss keys work again HostkeyAlgorithms +ssh-dss

# To allow weak remote key exchange algorithm KexAlgorithms +diffie-hellman-group1-sha1
Alternatively pass those three options using -o. For example allow the key exchange when running SSH
ssh -oKexAlgorithms=+diffie-hellman-group1-sha1 <host>

Solution

Replace all your dss keys to avoid keys stopping to work. And upgrade all SSH version to avoid offering legacy key exchange algorithms.

July 13, 2016 10:07 PM

July 11, 2016

HolisticInfoSec.org

Toolsmith Release Advisory: Steph Locke's HIBPwned R package

I'm a bit slow on this one but better late than never. Steph dropped her HIBPwned R package on CRAN at the beginning of June, and it's well worth your attention. HIBPwned is an R package that wraps Troy Hunt's HaveIBeenPwned.com API, useful to check if you have an account that has been compromised in a data breach. As one who has been "pwned" no less than three times via three different accounts thanks to LinkedIn, Patreon, and Adobe, I love Troy's site and have visited it many times.

When I spotted Steph's wrapper on R-Bloggers, I was quite happy as a result.
Steph built HIBPwned to allow users to:
  • Set up your own notification system for account breaches of myriad email addresses & user names that you have
  • Check for compromised company email accounts from within your company Active Directory
  • Analyse past data breaches and produce reports and visualizations
I installed it from Visual Studio with R Tools via install.packages("HIBPwned", repos="http://cran.rstudio.com/", dependencies=TRUE).
You can also use devtools to install directly from the Censornet Github
if(!require("devtools")) install.packages("devtools")
# Get or upgrade from github
devtools::install_github("censornet/HIBPwned")
Source is available on the Censornet Github, as is recommended usage guidance.
As you run any of the HIBPwned functions, be sure to have called the library first: library("HIBPwned").

As mentioned, I've seen my share of pwnage, luckily to no real impact, but annoying nonetheless, and well worth constant monitoring.
I first combined my accounts into a vector and confirmed what I've already mentioned, popped thrice:
account_breaches(c("rmcree@yahoo.com","holisticinfosec@gmail.com","russ@holisticinfosec.org"), truncate = TRUE)
$`rmcree@yahoo.com`
   Name
1 Adobe

$`holisticinfosec@gmail.com`
      Name
1 LinkedIn

$`russ@holisticinfosec.org`
     Name
1 Patreon

You may want to call specific details about each breach to learn more, easily done continuing with my scenario using breached_site() for the company name or breached_sites() for its domain.
Breached
You may also be interested to see if any of your PII has landed on a paste site (Pastebin, etc.). The pastes() function is the most recent Steph added to HIBPwned.

Pasted
Uh oh, on the list here too, not quite sure how I ended up on this dump of "Egypt gov stuff". According to PK1K3, who "got pissed of at the Egypt gov", his is a "list of account the egypt govs is spying on if you find your email/number here u are rahter with them or slaves to them." Neither are true, but fascinating regardless.

Need some simple markdown to run every so often and keep an eye on your accounts? Try HIBPwned.Rmd. Download the file, open it R Studio, swap out my email addresses for yours, then select Knit HTML. You can also produce Word or PDF output if you'd prefer.

Report
Great stuff from Steph, and of course Troy. Use this wrapper to your advantage, and keep an eye out for other related work on itsalocke.com.

by Russ McRee (noreply@blogger.com) at July 11, 2016 04:19 AM

July 09, 2016

Sarah Allen

the screen as a genuine biological extension

Antoine Geiger’s art and writing explore “the screen as an object of ‘mass subculture,’ alienating the relation to our own body, and more generally to the physical world… It appeases the consciousness, stimulates it, orders it, subjugates it.”

man looking into cell phone with face stretched appearing to be sucked into or glued onto the screen

Omniprésent: it’s everywhere. In your pocket, your car, your
appartment, your street.
Omnipotent : it is your best fellow, you give him all your friends, your good feelings and your holiday pictures.
Omniscient : actual swiss knife of the 21st century, without him we’re all lost.

THE ESCAPE :
So we escape.
Even better, we project ourselves.
It’s like in cinemas, and yet it’s talking about you.
We press a button, screen turns on, and it’s like the whole physical world is frozen.
The show can start.
In the end we only escape from ourselves.

—Antoine Geiger

The whole SUR-FAKE series is delightful, strange and thought-provoking, and accompanying essay quoted above is just a page and worth reading. This visual artist creates unexpected imagery with words, describing this experience as a “curious ping-pong match with the pixels, terrified like a thick cloud of midges.”

The post the screen as a genuine biological extension appeared first on the evolving ultrasaurus.

by sarah at July 09, 2016 11:35 AM

July 08, 2016

Steve Kemp's Blog

I've been moving and updating websites.

I've spent the past days updating several of my websites to be "responsive". Mostly that means I open the site in firefox then press Ctrl-alt-m to switch to mobile-view. Once I have the mobile-view I then fix the site to look good in small small space.

Because my general design skills are poor I've been fixing most sites by moving to bootstrap, and ensuring that I don't use headers/footers that are fixed-position.

Beyond the fixes to appearances I've also started rationalizing the domains, migrating content across to new homes. I've got a provisional theme setup at steve.fi, and I've moved my blog over there too.

The plan for blog-migration went well:

  • Setup a redirect to from https://blog.steve.org.uk to https://blog.steve.fi/
  • Replace the old feed with a CGI script which outputs one post a day, telling visitors to update their feed.
    • This just generates one post, but the UUID of the post has the current date in it. That means it will always be fresh, and always be visible.
  • Updated the template/layout on the new site to use bootstrap.

The plan was originally to setup a HTTP-redirect, but I realized that this would mean I'd need to keep the redirect in-place forever, as visitors would have no incentive to fix their links, or update their feeds.

By adding the fake-RSS-feed, pointing to the new location, I am able to assume that eventually people will update, and I can drop the dns record for blog.steve.org.uk entirely - Already google seems to have updated its spidering and searching shows the new domain already.

July 08, 2016 06:28 AM

July 07, 2016

Sarah Allen

on the brink of war

When I was a little girl, I lived for a year in El Salvador. I went to a private school and generally lived a life of privilege. Our house was smaller than the one outside of Boston, and we didn’t have air-conditioning or a fancy compound like other members of the British-American club where we went swimming. My brother and I could go down to the corner store and buy firecrackers, and we would go across the street to a soccer field and find spots in the nearby trees where we could blow up piles of sticks. From there we could see where some folks lived: shacks with no running water, tin roofs, and painfully thin children.

In fifth grade, I’m sure I had no real idea of what was happening, but I knew I had privilege. Things were scary, but I did not fear for myself (although perhaps I should have). The poor would routinely get shot by the police and it was accepted as the way things are. Corruption and bribes were status quo. My home government (the United States) supported the government in El Salvador with arms to defeat the influence of Russian communism, despite no real evidence of that — this was a civil war with casualties on both sides. I remember when the dad of a kid in my class was kidnapped, and the family paid the ransom and the guerillas returned the dad, except his head was in a garbage bag, separated from his body.

I don’t mean to be gruesome, I just think about these things, when I hear about the violence by police in America today. This feels spookily familiar… I’ve been watching things get worse for over three decades. It is a good sign that people can take videos and speak out on twitter and express outrage #AltonSterling #PhilandoCastille #EssenceBowman #BlackLivesMatter

“It’s not what you see that should cause outrage, it’s what you have not seen.” —@shaft calling in Black to work tomorrow

I know for every video and photo there are those who were driving alone or where there were no bystanders with cell-phones.

The post on the brink of war appeared first on the evolving ultrasaurus.

by sarah at July 07, 2016 06:33 AM

July 06, 2016

syslog.me

New leap second at the end of the year

blog-TheTimelord-200A new leap second will be introduced at the end of 2016. We have six months to get ready, but this time it may be easier than before as several timekeeping software have implemented some “leap smear” algorithm, which seems to be a very popular approach nowadays; e.g.: ntpd, the reference implementation for NTP, seems to have implemented leap smear from version 4.2.8p3 onward.

We’ll see how it goes. Until then… test!

     INTERNATIONAL EARTH ROTATION AND REFERENCE SYSTEMS SERVICE (IERS) 

SERVICE INTERNATIONAL DE LA ROTATION TERRESTRE ET DES SYSTEMES DE REFERENCE

SERVICE DE LA ROTATION TERRESTRE DE L'IERS
OBSERVATOIRE DE PARIS                                   
61, Av. de l'Observatoire 75014 PARIS (France)
Tel.      : +33 1 40 51 23 35
e-mail    : services.iers@obspm.fr
http://hpiers.obspm.fr/eop-pc


                                              Paris, 6 July 2016
                                                           
                                              Bulletin C 52
                                
 To authorities responsible for the measurement and distribution of time                                         


                                   UTC TIME STEP
                            on the 1st of January 2017
                      

 A positive leap second will be introduced at the end of December 2016.
 The sequence of dates of the UTC second markers will be:		
		
                          2016 December 31, 23h 59m 59s
                          2016 December 31, 23h 59m 60s
                          2017 January   1,  0h  0m  0s
              
 The difference between UTC and the International Atomic Time TAI is:

  from 2015 July 1, 0h UTC, to 2016 January 1 0h UTC   : UTC-TAI = - 36s
  from 2016 January 1, 0h UTC, until further notice    : UTC-TAI = - 37s 

 
  
 Leap seconds can be introduced in UTC at the end of the months of December 
 or June, depending on the evolution of UT1-TAI. Bulletin C is mailed every 
 six months, either to announce a time step in UTC or to confirm that there 
 will be no time step at the next possible date.
 
 
                                              Christian Bizouard
                                              Head
                                              Earth Orientation Center of IERS
                                              Observatoire de Paris, France

Tagged: leap second, ntp, Sysadmin

by bronto at July 06, 2016 11:24 AM

Sarah Allen

signals of happiness

passiflora-july-2016Digging in the dirt, pulling weeds, planting flowers… these things make me feel at peace. There are some activities in my life that I have discovered are symptoms of happiness. My garden blooms, I write, I do the dishes in the morning, and I take longer walks with my dog.

I don’t know that they actually make me happy. I don’t particularly like to write, it’s a way of thinking through things and connecting with other people in quiet, unexpected ways. Writing makes me just a little uncomfortable, but it helps somehow to send these thoughts outside of my head.

I can’t say I always love gardening. It’s kind of boring, but more interesting than meditation and with some similar positive effects. I like how I feel afterwards, and it makes me happy every time I walk by and see colorful splashes of color and so many shades of green.

These small acts of taking care of myself and being present in my surroundings are indicators. I try to notice when they stop, when the weeds grow wild, when I don’t take time to write and the coffee cups pile up in the sink. Today I noticed that I was gardening and it wasn’t even necessary, and I’ve been taking longer walks in the morning. I still struggle and have doubts and there’s all sorts of awful in the world, but when we go out there to face the world, I think we all need a couple of leading indicators of happiness.

The post signals of happiness appeared first on the evolving ultrasaurus.

by sarah at July 06, 2016 04:15 AM

July 05, 2016

Electricmonk.nl

cfgtrack: A simpel tool that tracks and reports diffs in files between invocations.

Sometimes other people change configurations on machines that I help administer. Unfortunately, I wouldn't know when they changed something or what they changed. There are many tools available to track configuration changes, but most are way overpowered. As a result they require too much time to set up and configure properly. All I want is a notification when things have changed, and a Unified Diff of what changes were made. I don't even care who made the changes.

So I wrote cfgtrack:

cfgtrack tracks and reports diffs in files between invocations.

It lets you add directories and files to a tracking list by keeping a separate copy of the file in a tracking directory. When invoked with the 'compare' command, it outputs a Diff of any changes made in the configuration file since the last time you invoked with the 'compare' command. It then automatically updates the tracked file. It can also send an email with the diff attached.

It's super simple to install and use. There are packages for:

  • Debian
  • Ubuntu
  • Other Debian-derivatives
  • Redhat  CentOs and other Redhat-derived systems
  • Platform independant tarbals and zipfiles.

Install one of the packages (see the README for instructions).

Specify something to track:

$ sudo cfgtrack /etc/
Now tracking /etc/

Show difference between the last compare, put those difference in the archive (/var/lib/cfgtrack/archive) and send an email to the admin with the diff attached:

$ sudo cfgtrack -a -m admin@example.com compare

For more info, see the project page on github.

by admin at July 05, 2016 08:01 PM

Exploring UPnP with Python

UPnP stands for Universal Plug and Play. It's a standard for discovering and interacting with services offered by various devices on a network. Common examples include:

  • Discovering, listing and streaming media from media servers
  • Controlling home network routers: e.g. automatic configuration of port forwarding to an internal device such as your Playstation or XBox.

In this article we'll explore the client side (usually referred to as the Control Point side) of UPnP using Python. I'll explain the different protocols used in UPnP and show how to write some basic Python code to discover and interact with devices. There's lots of information on UPnP on the Internet, but a lot of it is fragmented, discusses only certain aspects of UPnP or is vague on whether we're dealing with the client or a server. The UPnP standard itself is quite an easy read though.

Disclaimer: The code in this article is rather hacky and not particularly robust. Do not use it as a basis for any real projects.

Protocols

UPnP uses a variety of different protocols to accomplish its goals:

  • SSDP: Simple Service Discovery Protocol, for discovering UPnP devices on the local network.
  • SCPD: Service Control Point Definition, for defining the actions offered by the various services.
  • SOAP: Simple Object Access Protocol, for actually calling actions.

Here's a schematic overview of the flow of a UPnP session and where the different protocols come into play.

 

 

 

 

The standard flow of operations in UPnP is to first use SSDP to discover which UPnP devices are available on the network. Those devices return the location of an XML file which defines the various services offered by each device. Next we use SCPD on each service to discover the various actions offered by each service. Essentially, SCPD is an XML-based protocol which describes SOAP APIs, much like WSDL. Finally we use SOAP calls to interact with the services.

SSDP: Service Discovery

Lets take a closer look at SSDP, the Simple Service Discovery Protocol. SSDP operates over UDP rather than TCP. While TCP is a statefull protocol, meaning both end-points of the connection are aware of whom they're talking too, UDP is stateless. This means we can just throw UDP packets over the line, and we don't care much whether they are received properly or even received at all. UDP is often used in situations where missing a few packets is not a problem, such as streaming media.

SSDP uses HTTP over UDP (called HTTPU) in broadcasting mode. This allows all UPnP devices on the network to receive the requests regardless of whether we know where they are located. Here's a very simple example of how to perform an HTTPU query using Python:

import socket
 
msg = \
    'M-SEARCH * HTTP/1.1\r\n' \
    'HOST:239.255.255.250:1900\r\n' \
    'ST:upnp:rootdevice\r\n' \
    'MX:2\r\n' \
    'MAN:"ssdp:discover"\r\n'
 
# Set up UDP socket
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
s.settimeout(2)
s.sendto(msg, ('239.255.255.250', 1900) )
 
try:
    while True:
        data, addr = s.recvfrom(65507)
        print addr, data
except socket.timeout:
    pass

This little snippet of code creates a HTTP message using the M-SEARCH HTTP method, which is specific to UPnP. It then sets up a UDP socket, and sends out the HTTPU message to IP address 239.255.255.250, port 1900. That IP is a special broadcast IP address. It is not actually tied to any specific server, like normal IPs. Port 1900 is the one which UPnP servers will listen on for broadcasts.

Next, we listen on the socket for any replies. The socket has a timeout of 2 seconds. This means that after not receiving any data on the socket after two seconds, the s.recvfrom() call times out, which raises an exception. The exception is caught, and the program continues.

upnp_udp_broadcast

You will recall that we don't know how many devices might be on the network. We also don't know where they are nor do we have any idea how fast they will respond. This means we can't be certain about the number of seconds we must wait for replies. This is the reason why so many UPnP control points (clients) are so slow when they scan for devices on the network.

In general all devices should be able to respond in less than 2 seconds. It seems that manufacturers would rather be on the safe side and sometimes wait up to 10 seconds for replies. A better approach would be to cache previously found devices and immediately check their availability upon startup. A full device search could then be done asynchronous in the background. Then again, many uPNP devices set the cache validaty timeout extremely low, so clients (if they properly implement the standard) are forced to rediscover them every time.

Anyway, here's the output of the M-SEARCH on my home network. I've stripped some of the headers for brevity:

('192.168.0.1', 1900) HTTP/1.1 200 OK
USN: uuid:2b2561a3-a6c3-4506-a4ae-247efe0defec::upnp:rootdevice
SERVER: Linux/2.6.18_pro500 UPnP/1.0 MiniUPnPd/1.5
LOCATION: http://192.168.0.1:40833/rootDesc.xml

('192.168.0.2', 53375) HTTP/1.1 200 OK
LOCATION: http://192.168.0.2:1025/description.xml
SERVER: Linux/2.6.35-31-generic, UPnP/1.0, Free UPnP Entertainment Service/0.655
USN: uuid:60c251f1-51c6-46ae-93dd-0a3fb55a316d::upnp:rootdevice

Two devices responded to our M-SEARCH query within the specified number of seconds. One is a cable internet router, the other is Fuppes, a UPnP media server. The most interesting things in these replies are the LOCATION headers, which point us to an SCPD XML file: http://192.168.0.1:40833/rootDesc.xml.

SCPD, Phase I: Fetching and parsing the root SCPD file

The SCPD XML file (http://192.168.0.1:40833/rootDesc.xml) contains information on the UPnP server such as the manufacturer, the services offered by the device, etc. The XML file is rather big and complicated. You can see the full version, but here's a grealy reduced one from my router:

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns="urn:schemas-upnp-org:device-1-0">
  <device>
    <deviceType>urn:schemas-upnp-org:device:InternetGatewayDevice:1</deviceType>
    <friendlyName>Ubee EVW3226</friendlyName>
    <serviceList>
      <service>
        <serviceType>urn:schemas-upnp-org:service:Layer3Forwarding:1</serviceType>
        <controlURL>/ctl/L3F</controlURL>
        <eventSubURL>/evt/L3F</eventSubURL>
        <SCPDURL>/L3F.xml</SCPDURL>
      </service>
    </serviceList>
    <deviceList>
      <device>
        <deviceType>urn:schemas-upnp-org:device:WANDevice:1</deviceType>
        <friendlyName>WANDevice</friendlyName>
        <serviceList>
          <service>
            <serviceType>urn:schemas-upnp-org:service:WANCommonInterfaceConfig:1</serviceType>
            <serviceId>urn:upnp-org:serviceId:WANCommonIFC1</serviceId>
            <controlURL>/ctl/CmnIfCfg</controlURL>
            <eventSubURL>/evt/CmnIfCfg</eventSubURL>
            <SCPDURL>/WANCfg.xml</SCPDURL>
          </service>
        </serviceList>
        <deviceList>
          <device>
            <deviceType>urn:schemas-upnp-org:device:WANConnectionDevice:1</deviceType>
            <friendlyName>WANConnectionDevice</friendlyName>
            <serviceList>
              <service>
                <serviceType>urn:schemas-upnp-org:service:WANIPConnection:1</serviceType>
                <controlURL>/ctl/IPConn</controlURL>
                <eventSubURL>/evt/IPConn</eventSubURL>
                <SCPDURL>/WANIPCn.xml</SCPDURL>
              </service>
            </serviceList>
          </device>
        </deviceList>
      </device>
    </deviceList>
  </device>
</root>

It consists of basically three important things:

  • The URLBase
  • Virtual Devices
  • Services

URLBase

Not all SCPD XML files contain an URLBase (the one above from my router doesn't), but if they do, it looks like this:

<URLBase>http://192.168.1.254:80</URLBase>

This is the base URL for the SOAP requests. If the SCPD XML does not contain an URLBase element, the LOCATION header from the server's discovery response may be used as the base URL. Any paths should be stripped off, leaving only the protocol, IP and port. In the case of my internet router that would be: http://192.168.0.1:40833/

Devices

The XML file then specifies devices, which are virtual devices that the physical device contains. These devices can contain a list of services in the <ServiceList> tag. A list of sub-devices can be found in the <DeviceList> tag. The Devices in the deviceList can themselves contain a list of services and devices. Thus, devices can recursively contain sub-devices, as shown in the following diagram:

device_hier

As you can see, a virtual Device can contain a Device List, which can contain a virtual Device, etc. We are most interested in the <Service> elements from the <ServiceList>. They look like this:

<service>
  <serviceType>urn:schemas-upnp-org:service:WANCommonInterfaceConfig:1</serviceType>
  <serviceId>urn:upnp-org:serviceId:WANCommonIFC1</serviceId>
  <controlURL>/ctl/CmnIfCfg</controlURL>
  <eventSubURL>/evt/CmnIfCfg</eventSubURL>
  <SCPDURL>/WANCfg.xml</SCPDURL>
</service>
...
<service>
  <serviceType>urn:schemas-upnp-org:service:WANIPConnection:1</serviceType>
  <controlURL>/ctl/IPConn</controlURL>
  <eventSubURL>/evt/IPConn</eventSubURL>
  <SCPDURL>/WANIPCn.xml</SCPDURL>
</service>

The <URLBase> in combination with the <controlURL> gives us the URL to the SOAP server where we can send our requests. The URLBase in combination with the <SCPDURL> points us to a SCPD (Service Control Point Definition) XML file which contains a description of the SOAP calls. 

The following Python code extracts the URLBase, ControlURL and SCPDURL information:

import urllib2
import urlparse
from xml.dom import minidom
 
def XMLGetNodeText(node):
    """
    Return text contents of an XML node.
    """
    text = []
    for childNode in node.childNodes:
        if childNode.nodeType == node.TEXT_NODE:
            text.append(childNode.data)
    return(''.join(text))
 
location = 'http://192.168.0.1:40833/rootDesc.xml'
 
# Fetch SCPD
response = urllib2.urlopen(location)
root_xml = minidom.parseString(response.read())
response.close()
 
# Construct BaseURL
base_url_elem = root_xml.getElementsByTagName('URLBase')
if base_url_elem:
    base_url = XMLGetNodeText(base_url_elem[0]).rstrip('/')
else:
    url = urlparse.urlparse(location)
    base_url = '%s://%s' % (url.scheme, url.netloc)
 
# Output Service info
for node in root_xml.getElementsByTagName('service'):
    service_type = XMLGetNodeText(node.getElementsByTagName('serviceType')[0])
    control_url = '%s%s' % (
        base_url,
        XMLGetNodeText(node.getElementsByTagName('controlURL')[0])
    )
    scpd_url = '%s%s' % (
        base_url,
        XMLGetNodeText(node.getElementsByTagName('SCPDURL')[0])
    )
    print '%s:\n  SCPD_URL: %s\n  CTRL_URL: %s\n' % (service_type,
                                                     scpd_url,
                                                     control_url)

Output:

urn:schemas-upnp-org:service:Layer3Forwarding:1:
  SCPD_URL: http://192.168.0.1:40833/L3F.xml
  CTRL_URL: http://192.168.0.1:40833/ctl/L3F

urn:schemas-upnp-org:service:WANCommonInterfaceConfig:1:
  SCPD_URL: http://192.168.0.1:40833/WANCfg.xml
  CTRL_URL: http://192.168.0.1:40833/ctl/CmnIfCfg

urn:schemas-upnp-org:service:WANIPConnection:1:
  SCPD_URL: http://192.168.0.1:40833/WANIPCn.xml
  CTRL_URL: http://192.168.0.1:40833/ctl/IPConn

SCPD, Phase II: Service SCPD files

Let's look at the WANIPConnection service. We have an SCPD XML file for it at http://192.168.0.1:40833/WANIPCn.xml and a SOAP URL at http://192.168.0.1:40833/ctl/IPConn. We must find out which SOAP calls we can make, and which parameters they take. Normally SOAP would use a WSDL file to define its API. With UPnp however this information is contained in the SCPD XML file for the service. Here's an example of the full version of the WANIPCn.xml file. There are two interesting things in the XML file:

  • The <ActionList> element contains a list of actions understood by the SOAP server.
  • The <serviceStateTable> element contains metadata about the arguments we can send to SOAP actions, such as the type and allowed values.

ActionList

The <ActionList> tag contains a list of actions understood by the SOAP server. It looks like this:

<actionList>
  <action>
    <name>SetConnectionType</name>
    <argumentList>
      <argument>
        <name>NewConnectionType</name>
        <direction>in</direction>
        <relatedStateVariable>ConnectionType</relatedStateVariable>
      </argument>
    </argumentList>
  </action>
  <action>
    [... etc ...]
  </action>
</actionList>

In this example, we discover an action called SetConnectionType. It takes one incoming argument: NewConnectionType. The relatedStateVariable specifies which StateVariable this argument should adhere to.

serviceStateTable

Looking at the <serviceStateTable> section later on in the XML file, we see:

<serviceStateTable>
  <stateVariable sendEvents="no">
    <name>ConnectionType</name>
    <dataType>string</dataType>
  </stateVariable>
  <stateVariable>
  [... etc ...]
  </stateVariable>
</serviceStateTable>

From this we conclude that we need to send an argument with name "ConnectionType" and type "string" to the SetConnectionType SOAP call.

Another example is the GetExternalIPAddress action. It takes no incoming arguments, but does return a value with the name "NewExternalIPAddress". The action will return the external IP address of your router. That is, the IP address you use to connect to the internet. 

<action>
  <name>GetExternalIPAddress</name>
  <argumentList>
    <argument>
      <name>NewExternalIPAddress</name>
      <direction>out</direction>
      <relatedStateVariable>ExternalIPAddress</relatedStateVariable>
    </argument>
  </argumentList>
</action>

Let's make a SOAP call to that action and find out what our external IP is.

SOAP: Calling an action

Normally we would use a SOAP library to create a call to a SOAP service. In this article I'm going to cheat a little and build a SOAP request from scratch.

import urllib2
 
soap_encoding = "http://schemas.xmlsoap.org/soap/encoding/"
soap_env = "http://schemas.xmlsoap.org/soap/envelope"
service_ns = "urn:schemas-upnp-org:service:WANIPConnection:1"
soap_body = """<?xml version="1.0"?>
<soap-env:envelope soap-env:encodingstyle="%s" xmlns:soap-env="%s">
  <soap-env:body>
    <m:getexternalipaddress xmlns:m="%s">
    </m:getexternalipaddress>
   </soap-env:body>
</soap-env:envelope>""" % (soap_encoding, service_ns, soap_env)
 
soap_action = "urn:schemas-upnp-org:service:WANIPConnection:1#GetExternalIPAddress"
headers = {
    'SOAPAction': u'"%s"' % (soap_action),
    'Host': u'192.168.0.1:40833',
    'Content-Type': 'text/xml',
    'Content-Length': len(soap_body),
}
 
ctrl_url = "http://192.168.0.1:40833/ctl/IPConn"
 
request = urllib2.Request(ctrl_url, soap_body, headers)
response = urllib2.urlopen(request)
 
print response.read()

The SOAP server returns a response with our external IP in it. I've pretty-printed it for your convenience and removed some XML namespaces for brevity:

<?xml version="1.0"?>
<s:Envelope xmlns:s=".." s:encodingStyle="..">
  <s:Body>
    <u:GetExternalIPAddressResponse xmlns:u="urn:schemas-upnp-org:service:WANIPConnection:1">
      <NewExternalIPAddress>212.100.28.66</NewExternalIPAddress>
    </u:GetExternalIPAddressResponse>
  </s:Body>
</s:Envelope>

We can now put the response through an XML parser and combine it with the SCPD XML's <argumentList> and <serviceStateTable> to figure out which output parameters we can expect and what type they are. Doing this is beyond the scope of this article, since it's rather straight-forward yet takes a reasonable amount of code. Suffice to say that our extenal IP is 212.100.28.66.

Summary

To summarise, these are the steps we take to actually do something useful with a UPnP service:

  1. Broadcast a HTTP-over-UDP (HTTPU) message to the network asking for UPnP devices to respond.
  2. Listen for incoming UDP replies and extract the LOCATION header.
  3. Send a WGET to fetch a SCPD XML file from the LOCATION.
  4. Extract services and/or devices from the SCPD XML file.
    1. For each service, extract the Control and SCDP urls.
    2. Combine the BaseURL (or if it was not present in the SCPD XML, use the LOCATION header) with the Control and SCDP url's.
  5. Send a WGET to fetch the service's SCPD XML file that describes the actions it supports.
  6. Send a SOAP POST request to the service's Control URL to call one of the actions that it supports.
  7. Receive and parse reply.

An example with Requests on the left and Responses on the right. Like all other examples in this article, the XML has been heavily stripped of redundant or unimportant information:

upnp_req_resp

Conclusion

I underwent this whole journey of UPnP because I wanted a way transparently support connections from externals networks to my locally-running application. While UPnP allows me to do that, I feel that UPnP is needlessly complicated. The standard, while readable, feels like it's designed by committee. The indirectness of having to fetch multiple SCPD files, the use of non-standard protocols, the nestable virtual sub-devices… it all feels slightly unnecesarry.  Then again, it could be a lot worse. One only needs to take a quick look at SAML v2 to see that UPnP isn't all that bad.

All in all, it let me do what I needed, and it didn't take too long to figure out how it worked. As a kind of exercise I partially implemented a high-level simple to use UPnP client for python, which is available on Github. Take a look at the source for more insights on how to deal with UPnP.

by admin at July 05, 2016 07:56 PM

July 04, 2016

Anton Chuvakin - Security Warrior

July 01, 2016

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – June 2016

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts/topics this month:
  1. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009. Is it relevant now? Well, you be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” …  [239 pageviews]
  2. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on security monitoring use cases here! [96 pageviews]
  3. My classic PCI DSS Log Review series is always popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ as well), useful for building log review processes and procedures , whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book (out in its 4th edition!)[90+ pageviews to the main tag]
  4. Simple Log Review Checklist Released!” is often at the top of this list – this aging checklist is still a very useful tool for many people. “On Free Log Management Tools” is a companion to the checklist (updated version) [89 pageviews]
  5. “SIEM Resourcing or How Much the Friggin’ Thing Would REALLY Cost Me?” is a quick framework for assessing the SIEM project (well, a program, really) costs at an organization (much more details on this here in this paper). [70 pageviews of total 3475 pageviews to all blog page]
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has about 4X of the traffic of this blog]: 
 
Current research on SOC and threat intelligence [2 projects]:
 
Past research on IR:
Past research on EDR:
Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015.
Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on Aug 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Previous post in this endless series:

by Anton Chuvakin (anton@chuvakin.org) at July 01, 2016 05:59 PM

Michael Biven

A Need for a Honest Look at How We Do Incident Management

Compared with other fields ours is still young and we haven’t figured out all the things just yet. The natural tight connection between academics, open source software and the improvements we’ve already seen can make it easy to think we’re already doing all of the hard work. All of which has been on specific technical challenges and very little on how we as an industry should improve how we work.

Consider the difference between how much attention we place on data we collect from the servers and services we support compared with what we have available for our entire field. We love dashboards and metrics to the point that they’re used to drive businesses and/or teams. Why haven’t we done the same thing at the macro level to help improve and guide our profession?

In 1895 a group was formed to look at standardizing the installation of automatic sprinklers in the United States. The results of these efforts was the creation of the National Fire Protection Agency and a set of rules for installing sprinklers that became what is now NFPA 13. This has been updated over the years, but it’s still the current standard for installing and maintaining automatic sprinklers.

Since then there has never been a multi fatality fire in a building that had a properly installed and maintained sprinkler system. The only exceptions are due to explosions and deaths from fire fighting.

The NFPA continued to learn from both large tragedies that created public outcry and other common fire conditions to shape the fire codes over the years. While I was a firefighter and fire inspector I was told that almost every part of the fire codes we used could be traced back to some event that prompted it to be written. Fires like the Triangle Shirtwaist fire (1911), Cocoanut Grove fire (1942), Our Lady of Angeles School fire (1958), Beverly Hills Supper Club fire (1977), Happyland Social Club fire (1990), and the Station Nightclub fire (2003).

As a response to the growing fire problem the country had through the 60s and 70s the National Commission on Fire Prevention and Control published a report called America Burning. It began with a set of recommendations on establishing a national organization and a data system to provide focus, review and analysis of the entire issue. From this recommendation the National Fire Incident Reporting System (NFIRS) was created and the Fire Service was then able to address larger issues that the data highlighted.

We have conferences, blog posts and books from our peers sharing their experiences on the issues they’re facing. This is wonderful and one of the reasons I’m drawn to the work that I do, but this does not address the need for an unbiased and professional response to the bigger issues we face. We could learn from both the example of the NFIRS and the CERT teams from our information security peers to create a group to study how we do incident management and provide recommendations.

Imagine if we had a way to voluntarily report incident data that could then be used for analysis. Although we’ve made improvements over years in how we discover, respond, and prevent incidents, we still have a long way to go in how we record our collective history and learn from it.

July 01, 2016 03:01 PM

Everything Sysadmin

"The Martian" made me depressed

Have you seen/read The Martian?

What's so sad about the movie/book is that it is a reminder of what could have been.

Part of the premise is that after the Apollo program, the U.S. continued their plans for landing on Mars. Such plans were dropped for the less ambitious Shuttle program.

Think about it. In most science fiction the science is unbelievable. In The Martian, the science was pretty darn accurate and the unbelievable part is that U.S. politicians had the audacity to continue NASA's funding level.

by Tom Limoncelli at July 01, 2016 03:00 PM

June 30, 2016

Everything Sysadmin

LISA Conversations Episode 11: Russell Pavlicek on "Unleashing the Power of the Unikernel"

Episode 11 of LISA Conversations is Russell Pavlicek, who presented Unleashing the Power of the Unikernel at LISA '15.

You won't want to miss this!

by Tom Limoncelli at June 30, 2016 06:00 PM

June 28, 2016

TaoSecurity

Latest PhD Thesis Title and Abstract

In January I posted Why a War Studies PhD? I recently decided to revise my title and abstract to include attention to both offensive and defensive aspects of intrusion campaigns.

I thought some readers might be interested in reading about my current plans for the thesis, which I plan to finish and defend in early 2018.

The following offers the title and abstract for the thesis.

Network Intrusion Campaigns: Operational Art in Cyberspace 

Campaigns, Not Duels: The Operational Art of Cyber Intrusions*

Intruders appear to have the upper hand in cyberspace, eroding users' trust in networked organizations and the data that is their lifeblood. Three assumptions prevail in the literature and mainstream discussion of digital intrusions. Distilled, these assumptions are that attacks occur at blinding speed with immediate consequences, that victims are essentially negligent, and that offensive initiative dominates defensive reaction. 
This thesis examines these assumptions through two research questions. First, what characterizes network intrusions at different levels of war? Second, what role does operational art play in network intrusion campaigns? 
By analyzing incident reports and public cases, the thesis refutes the assumptions and leverages the results to improve strategy.  
The thesis reveals that strategically significant attacks are generally not "speed-of-light" events, offering little chance for recovery.  Digital defenders are hampered by a range of constraints that reduce their effectiveness while simultaneously confronting intruders who lack such restrictions. Offense does not necessarily overpower defense, constraints notwithstanding, so long as the defenders conduct proper counter-intrusion campaigns. 
The thesis structure offers an introduction to the subject, and an understanding of cybersecurity challenges and trade-offs. It reviews the nature of digital intrusions and the levels of war, analyzing the interactions at the levels of tools/tactics/technical details, operations and campaigns, and strategy and policy. The thesis continues by introducing historical operational art, applying lessons from operational art to network intrusions, and applying lessons from network intrusions to operational art. The thesis concludes by analyzing the limitations of operational art in evolving digital environments.

*See the post Updated PhD Thesis Title for details on the new title.

by Richard Bejtlich (noreply@blogger.com) at June 28, 2016 01:42 PM

Sean's IT Blog

Horizon 7.0 Part 5–SSL Certificates

SSL certificates are an important part of all Horizon environments .  They’re used to secure communications from client to server as well as between the various servers in the environment.  Improperly configured or maintained certificate authorities can bring an environment to it’s knees – if a connection server cannot verify the authenticity of a certificate – such as an expired revocation list from an offline root CA, communications between servers will break down.  This also impacts client connectivity – by default, the Horizon client will not connect to Connection Servers, Security Servers, or Access Points unless users change the SSL settings.

Most of the certificates that you will need for your environment will need to be minted off of an internal certificate authority.  If you are using a security server to provide external access, you will need to acquire a certificate from a public certificate authority.  If you’re building a test lab or don’t have the budget for a commercial certificate from one of the major certificate providers, you can use a free certificate authority such as StartSSL or a low-cost certificate provider such as NameCheap. 

Prerequisites

Before you can begin creating certificates for your environment, you will need to have a certificate authority infrastructure set up.  Microsoft has a great 2-Tier PKI walkthrough on TechNet. 

Note: If you use the walkthrough to set up your PKI environment., you will need to alter the configuration file to remove the  AlternateSignatureAlgorithm=1 line.  This feature does not appear to be supported on vCenter  and can cause errors when importing certificates.

Once your environment is set up, you will want to create a template for all certificates used by VMware products.  Derek Seaman has a good walkthrough on creating a custom VMware certificate template. 

Note: Although a custom template isn’t required, I like to create one per Derek’s instructions so all VMware products are using the same template.  If you are unable to do this, you can use the web certificate template for all Horizon certificates.

Creating The Certificate Request

Horizon 7.0 handles certificates on the Windows Server-based components the same way as Horizon 6.x and Horizon 5.3.  Certificates are stored in the Windows certificate store, so the best way of generating certificate requests is to use the certreq.exe certificate tool.  This tool can also be used to submit the request to a local certificate authority and accept and install a certificate after it has been issued.

Certreq.exe can use a custom INF file to create the certificate request.  This INF file contains all of the parameters that the certificate request requires, including the subject, the certificate’s friendly name, if the private key can be exported, and any subject alternate names that the certificate requires.

If you plan to use Subject Alternate Names on your certificates, I highly recommend reviewing this article from Microsoft.  It goes over how to create a certificate request file for SAN certificates.

A  certificate request inf file that you can use as a template is below.  To use this template, copy and save the text below into a text file, change the file to match your environment, and save it as a .inf file.

;----------------- request.inf -----------------
[Version]

Signature="$Windows NT$"

[NewRequest]

Subject = "CN=<Server Name>, OU=<Department>, O=<Company>, L=<City>, S=<State>, C=<Country>" ; replace attribues in this line using example below
KeySpec = 1
KeyLength = 2048
; Can be 2048, 4096, 8192, or 16384.
; Larger key sizes are more secure, but have
; a greater impact on performance.
Exportable = TRUE
FriendlyName = "vdm"
MachineKeySet = TRUE
SMIME = False
PrivateKeyArchive = FALSE
UserProtected = FALSE
UseExistingKeySet = FALSE
ProviderName = "Microsoft RSA SChannel Cryptographic Provider"
ProviderType = 12
RequestType = PKCS10
KeyUsage = 0xa0

[EnhancedKeyUsageExtension]

OID=1.3.6.1.5.5.7.3.1 ; this is for Server Authentication

[Extensions]

2.5.29.17 = "{text}"
_continue_ = "dns=<DNS Short Name>&"
_continue_ = "dns=<Server FQDN>&"
_continue_ = "dns=<Alternate DNS Name>&"

[RequestAttributes]

CertificateTemplate = VMware-SSL

;-----------------------------------------------

Note: When creating a certificate, the state or province should not be abbreviated.  For instance, if you are in Wisconsin, the full state names should be used in place of the 2 letter state postal abbreviation.

Note:  Country names should be abbreviated using the ISO 3166 2-character country codes.

The command the generate the certificate request is:

certreq.exe –New <request.inf> <certificaterequest.req>

Submitting the Certificate Request

Once you have a certificate request, it needs to be submitted to the certificate authority.  The process for doing this can vary greatly depending on the environment and/or the third-party certificate provider that you use. 

If your environment allows it, you can use the certreq.exe tool to submit the request and retrieve the newly minted certificate.  The command for doing this is:

certreq –submit -config “<ServerName\CAName>” “<CertificateRequest.req>” “<CertificateResponse.cer>

If you use this method to submit a certificate, you will need to know the server name and the CA’s canonical name in order to submit the certificate request.

Accepting the Certificate

Once the certificate has been generated, it needs to be imported into the server.  The import command is:

certreq.exe –accept “<CertificateResponse.cer>

This will import the generated certificate into the Windows Certificate Store.

Using the Certificates

Now that we have these freshly minted certificates, we need to put them to work in the Horizon environment.  There are a couple of ways to go about doing this.

1. If you haven’t installed the Horizon Connection Server components on the server yet, you will get the option to select your certificate during the installation process.  You don’t need to do anything special to set the certificate up.

2. If you have installed the Horizon components, and you are using a self-signed certificate or a certificate signed from a different CA, you will need to change the friendly name of the old certificate and restart the Connection Server or Security Server services.

Horizon requires the Connection Server certificate to have a friendly name value of vdm.  The template that is posted above sets the friendly name of the new certificate to vdm automatically, but this will conflict with any existing certificates. 

1
Friendly Name

The steps for changing the friendly name are:

  1. Go to Start –> Run and enter MMC.exe
  2. Go to File –> Add/Remove Snap-in
  3. Select Certificates and click Add
  4. Select Computer Account and click Finish
  5. Click OK
  6. Right click on the old certificate and select Properties
  7. On the General tab, delete the value in the Friendly Name field, or change it to vdm_old
  8. Click OK
  9. Restart the Horizon service on the server

2

Certificates and Horizon Composer

Unfortunately, Horizon Composer uses a different method of managing certificates.  Although the certificates are still stored in the Windows Certificate store, the process of replacing Composer certificates is a little more involved than just changing the friendly name.

The process for replacing or updating the Composer certificate requires a command prompt and the SVIConfig tool.  SVIConfig is the Composer command line tool.  If you’ve ever had to remove a missing or damaged desktop from your Horizon environment, you’ve used this tool.

The process for replacing the Composer certificate is:

  1. Open a command prompt as Administrator on the Composer server
  2. Change directory to your VMware Horizon Composer installation directory
    Note: The default installation directory is C:\Program Files (x86)\VMware\VMware View Composer
  3. Run the following command: sviconfig.exe –operation=replacecertificate –delete=false
  4. Select the correct certificate from the list of certificates in the Windows Certificate Store
  5. Restart the Composer Service

3

A Successful certificate swap

At this point, all of your certificates should be installed.  If you open up the Horizon Administrator web page, the dashboard should have all green lights.  If you do not see all green lights, you may need to check the health of your certificate environment to ensure that the Horizon servers can check the validity of all certificates and that a CRL hasn’t expired.

If you are using a certificate signed on an internal CA for servers that your end users connect to, you will need to deploy your root and intermediate certificates to each computer.  This can be done through Group Policy for Windows computers or by publishing the certificates in the Active Directory certificate store.  If you’re using Teradici PCoIP Zero Clients, you can deploy the certificates as part of a policy with the management VM.  If you don’t deploy the root and intermediate certificates, users will not be able to connect without disabling certificate checking in the Horizon client.

Access Point Certificates

Unlike Connection Servers and Composer, Horizon Access Point does not run on Windows.  It is a Linux-based virtual appliance, and it requires a different certificate management technique.  The certificate request and private key file for the Access Point should be generated with OpenSSL, and the certificate chain needs to be concatenated into a single file.  The certificate is also installed during, or shortly after, deployment using Chris Halstead’s fling or the appliance REST API. 

Because this is an optional component, I won’t go into too much detail on creating and deploying certificates for the Access Point in this section.  I will cover that in more detail in the section on deploying the Access Point appliance later in the series.


by seanpmassey at June 28, 2016 01:00 PM

June 27, 2016

OpenSSL

Undefined Pointer Arithmetic

In commits a004e72b9 (1.0.2) and 6f35f6deb (1.0.1) we released a fix for CVE-2016-2177. The fix corrects a common coding idiom present in OpenSSL 1.0.2 and OpenSSL 1.0.1 which actually relies on a usage of pointer arithmetic that is undefined in the C specification. The problem does not exist in master (OpenSSL 1.1.0) which refactored this code some while ago. This usage could give rise to a low severity security issue in certain unusual scenarios. The OpenSSL security policy (https://www.openssl.org/policies/secpolicy.html) states that we publish low severity issues directly to our public repository, and they get rolled up into the next release whenever that happens. The rest of this blog post describes the problem in a little more detail, explains the scenarios where a security issue could arise and why this issue has been rated as low severity.

The coding idiom we are talking about here is this one:

1
2
3
4
if (p + len > limit)
{
    return; /* Too long */
}

Where p points to some malloc’d data of SIZE bytes and limit == p + SIZE. len here could be from some externally supplied data (e.g. from a TLS message).

The idea here is that we are performing a length check on peer supplied data to ensure that len (received from the peer) is actually within the bounds of the supplied data. The problem is that this pointer arithmetic is only defined by the C90 standard (which we aim to conform to) if len <= SIZE, so if len is too long then this is undefined behaviour, and of course (theoretically) anything could happen. In practice of course it usually works the way you expect it to work and there is no issue except in the case where an overflow occurs.

In order for an overflow to occur p + len would have to be sufficiently large to exceed the maximum addressable location. Recall that len here is received from the peer. However in all the instances that we fixed it represents either one byte or two bytes, i.e. its maximum value is 0xFFFF. Assuming a flat 32 bit address space this means that if p < 0xFFFF0000 then there is no issue, or putting it another way, approx. 0.0015% of all conceivable addresses are potentially “at risk”.

Most architectures do not have the heap at the edge of high memory. Typically, one finds:

    0: Reserved low-memory
    LOW:    TEXT
            DATA
            BSS
            HEAP
            BREAK
            ...
            mapped libraries and files
            STACK

And often (on systems with a unified kernel/user address space):

HIGH:
    kernel

I don’t claim to know much about memory lay outs so perhaps there are architectures out there that don’t conform to this. So, assuming that they exist, what would the impact be if p was sufficiently large that an overflow could occur?

There are two primary locations where a 2 byte len could be an issue (I will ignore 1 byte lens because it seems highly unlikely that we will ever run into a problem with those in reality). The first location is whilst reading in the ciphersuites in a ClientHello, and the second is whilst reading in the extensions in the ClientHello.

Here is the (pre-patch) ciphersuite code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
n2s(p, i);

if (i == 0) {
    al = SSL_AD_ILLEGAL_PARAMETER;
    SSLerr(SSL_F_SSL3_GET_CLIENT_HELLO, SSL_R_NO_CIPHERS_SPECIFIED);
    goto f_err;
}

/* i bytes of cipher data + 1 byte for compression length later */
if ((p + i + 1) > (d + n)) {
    /* not enough data */
    al = SSL_AD_DECODE_ERROR;
    SSLerr(SSL_F_SSL3_GET_CLIENT_HELLO, SSL_R_LENGTH_MISMATCH);
    goto f_err;
}
if (ssl_bytes_to_cipher_list(s, p, i, &(ciphers)) == NULL) {
    goto err;
}
p += i;

Here i represents the two byte length read from the peer, p is the pointer into our buffer, and d + n is the end of the buffer. If p + i + 1 overflows then we will end up passing an excessively large i value to ssl_bytes_to_cipher_list(). Analysing that function it can be seen that it will loop over all of the memory from p onwards interpreting it as ciphersuite data bytes and attempting to read its value. This is likely to cause a crash once we overflow (if not before).

The analysis of the extensions code is similar and more complicated, but the outcome is the same. It will loop over the out-of-bounds memory interpreting it as extension data and will eventually crash.

It seems that the only way the above two issues could be exploited is via a DoS.

The final consideration in all of this, is how much of the above is under the control of the attacker. Clearly len is, but not the value of p. In order to exploit this an attacker would have to first find a system which is likely to allocate at the very highest end of address space and then send a high number of requests through until one “got lucky” and a DoS results. A possible alternative approach is to send out many requests to many hosts attempting to find one that is vulnerable.

It appears that such attacks, whilst possible, are unlikely and would be difficult to achieve successfully in reality - with a worst case scenario being a DoS. Given the difficulty of the attack and the relatively low impact we rated this as a low severity issue.

With thanks to Guido Vranken who originally reported this issue.

June 27, 2016 05:00 PM

TaoSecurity

Updated PhD Thesis Title

Yesterday I posted Latest PhD Thesis Title and Abstract. One of my colleagues Ben Buchanan subsequently contacted me via Twitter and we exchanged a few messages. He prompted me to think about the title.

Later I ruminated on the title of a recent book by my advisor, Dr. Thomas Rid. He wrote Cyber War Will Not Take Place. One of the best parts of the book is the title. In six words you get his argument as succinctly as possible. (It could be five words if you pushed "cyber" and "war" together, but the thought alone makes me cringe, in the age of cyber-everything.)

I wondered if I could transform my latest attempt at a thesis title into something that captured my argument in a succinct form.

I thought about the obsession of the majority of the information security community on the tool and tactics level of war. Too many technicians think about security as a single-exchange contest between an attacker and a defender, like a duel.

That reminded me of a problem I have with Carl von Clausewitz's definition of war.

We shall not enter into any of the abstruse definitions of war used by publicists. We shall keep to the element of the thing itself, to a duel. War is nothing but a duel on an extensive scale.

- On War, Chapter 1

Clausewitz continues by mentioning "the countless number of duels which make up a war," and then makes his famous statement that "War therefore is an act of violence to compel our opponent to fulfill our will." However, I've never liked the tactically-minded idea that war is a "duel."

This concept, plus the goal to deliver a compact argument, inspired me to revise my thesis title and subtitle to the following:

Campaigns, Not Duels: The Operational Art of Cyber Intrusions

In the first three words I deliver my argument, and in the subtitle I provide context by including my key perspective ("operational art"), environment ("cyber," yes, a little part of me is dying, but it's a keyword), and "intrusions."

When I publish the thesis as a book in 2018, I hope to use the same words in the book title.

by Richard Bejtlich (noreply@blogger.com) at June 27, 2016 03:24 PM

Sean's IT Blog

Introducing StacksWare

Who uses this application?  How often are they using it?  Are we getting business value out of the licensing we purchased?  Is our licensing right-sized for our usage or environment?  How do we effectively track application usage in the era of non-persistent virtual desktops with per-user application layers?

These are common questions that the business asks when planning for an end-user computing environment, when planning to add licenses, or when maintenance and support are up for renewal.  They aren’t easy questions to answer either, and while IT can usually see where applications are installed, it’s not always easy to see who is using them and  how often they are being used.

The non-persistent virtual desktop world also has it’s own challenges.  The information required to effectively troubleshoot problems and issues is lost the minute the user logs out of the desktop.

Most tools that cover the application licensing compliance and monitoring space are difficult to set up and use.

I ran into StacksWare in the emerging vendors section at last year’s VMworld, and their goal is to address these challenges.  Their goal is to provide an easy-to-use tool that provides real-time insight into the   The company  was started as a research project at Stanford University that was sponsored by Sanjay Poonen of VMware, and it has received $2 million in venture capital funding from Greylock and Lightspeed.

StacksWare is a cloud-based application that uses an on-premises virtual appliance to track endpoints, users, and applications.  The on-premises portion integrates with vSphere and Active Directory to retrieve information about the environment, and data is presented in a cloud-based web interface.

It offers an HTML5 interface with graphs that update in real-time.  The interface is both fast and attractive, and it does not offer a lot of clutter or distraction.  It’s very well organized as well, and the layout makes it easy to find the information that you’re looking for quickly.

image

Caption: The StacksWare main menu located on the left side of the interface.

image

Caption: Application statistics for a user account in my lab.

image

Caption: Application list and usage statistics.

StacksWare can track both individual application usage and application suite usage.  So rather than having to compile the list of users who are using Word, Excel, and PowerPoint individually, you can track usage by the version of the Microsoft Office Suite that you have installed.  You can also assign license counts, licensing type, and costs associated with licensing.  StacksWare can track this information and generate licensing reports that can be used for compliance or budgetary planning.

image

Caption: The license management screen

One thing I like about StacksWare’s on-premises appliance is that it is built on Docker.  Upgrading the code on the local appliance is as simple as rebooting it, and it will download and install any updated containers as part of the bootup process.  This simplifies the updating process and ensures that customers get access to the latest code without any major hassles.

One other nice feature of StacksWare is the ability to track sessions and activity within sessions.  StacksWare can show me what applications I’m launching and when I am launching them in my VDI session. 

image

Caption: Application details for a user session.

For more information on StacksWare, you can check out their website at www.stacksware.com.  You can also see some of their newer features in action over at this YouTube video.  StacksWare will also be at BriForum and VMworld this year, and you can check them out in the vendor solutions area.


by seanpmassey at June 27, 2016 01:45 PM

June 23, 2016

Errata Security

Use the freakin' debugger

This post is by a guy who does "not use a debugger". That's stupid. Using a friendly source-level debugger (Visual Studio, XCode, Eclipse) to step line-by-line through working code is what separates the 10x programmers from the wannabes. Yes, it's a bit of a learning hurdle, and creating "project" files for small projects is a bit of a burden, but do it. It'll vastly improve your coding skill.

That post quotes people like Rob Pike saying that stepping line-by-line is a crutch, that instead you should be able to reason about code. And that's true, if you understand what you are doing completely.

But in the real world, you never do. Programmers are constantly forced to stretch and use unfamiliar languages. Worse yet, they are forced to use unfamiliar libraries. Documentation sucks, there's no possible way to understand APIs than to step through code -- either watching the returned values, or compiling their source and stepping into it.

As an experienced programmer, it's true I often don't step through every line. The lines I understand completely, the ones I can fully reason about, I don't bother. But the programmer spends only a small percentage of their time on things they understand -- most of the time spent coding is noodling on the things they don't understand, and that's where the debugger comes in.

And this doesn't even take into account that in the real world, where programmers spend a lot of time working on other people's code. Sometimes the only way to figure it out is to set a breakpoint and run the unit test until it reaches that point.

Programmers fetishize editors. Real programmers, those who produce a lot of code, fetishize debuggers, both the one built into the IDE for debugging working code, and also the specialized tools for diagnosing problems in buggy code.

Seriously, if you are learning to program, learn to use the debugger in the integrated IDE. Step line-by-line through every line of code, until you grok it.. Microsoft's Visual Code is a good system for debugging JavaScript (which is a good starting language to learn). You'll thank me later when you are pulling down seven figures as a 10x programmer.




by Robert Graham (noreply@blogger.com) at June 23, 2016 08:41 AM

HolisticInfoSec.org

Toolsmith Tidbit: XssPy

You've likely seen chatter recently regarding the pilot Hack the Pentagon bounty program that just wrapped up, as facilitated by HackerOne. It should come as no surprise that the most common vulnerability reported was cross-site scripting (XSS). I was invited to participate in the pilot, yes I found and submitted an XSS bug, but sadly, it was a duplicate finding to one already reported. Regardless, it was a great initiative by DoD, SecDef, and the Defense Digital Service, and I'm proud to have been asked to participate. I've spent my share of time finding XSS bugs and had some success, so I'm always happy when a new tool comes along to discover and help eliminate these bugs when responsibly reported.
XssPy is just such a tool.
A description as paraphrased from it's Github page:
XssPy is a Python tool for finding Cross Site Scripting vulnerabilities. XssPy traverses websites to find all the links and subdomains first, then scans each and every input on each and every page discovered during traversal.
XssPy uses small yet effective payloads to search for XSS vulnerabilities.
The tool has been tested in parallel with commercial vulnerability scanners, most of which failed to detect vulnerabilities that XssPy was able to find. While most paid tools typically scan only one site, XssPy first discovers sub-domains, then scans all links.
XssPy includes:
1) Short Scanning
2) Comprehensive Scanning
3) Subdomain discovery
4) Comprehensive input checking
XssPy has discovered cross-site scripting vulnerabilities in the websites of MIT, Stanford, Duke University, Informatica, Formassembly, ActiveCompaign, Volcanicpixels, Oxford, Motorola, Berkeley, and many more.

Install as follows:
git clone https://github.com/faizann24/XssPy/ /opt/xsspy
Python 2.7 is required and you should have mechanize installed. If mechanize is not installed, type pip install mechanize in the terminal.

Run as follows:
python XssPy.py website.com (no http:// or www).

Let me know what successes you have via email or Twitter and let me know if you have questions (russ at holisticinfosec dot org or @holisticinfosec).
Cheers…until next time.

by Russ McRee (noreply@blogger.com) at June 23, 2016 06:46 AM

June 18, 2016

syslog.me

How I configure a docker host with CFEngine

DockerAfter some lengthy busy times I’ve been able to restart my work on Docker. Last time I played with some containers to create a Consul cluster using three containers running on the same docker host — something you will never want to do in production.

And the reason why I was playing with a Consul cluster on docker was that you need a key/value store to play with overlay networks in Docker, and Consul is one of the supported stores. Besides, Consul is another technology I wanted to play with since the first minute I’ve known it.

To run an overlay network you need more than one Docker host otherwise it’s pretty pointless. That suggested me that it was time to automate the installation of a Docker host, so that I could put together a test lab quickly and also maintain it. And, as always, CFEngine was my friend. The following policy will not work out of the box for you since it uses a number of libraries of mine, but I’m sure you’ll get the idea.

bundle agent manage_docker(docker_users_list)
# This bundle
# - configures the official apt repo for docker
# - installs docker from the official repositories
# - adds @(docker_users_list) to the docker group
# - ensures the service is running
{
  vars:
      "source"
        slist => { "deb https://apt.dockerproject.org/repo debian-$(inventory_lsb.codename) main" },
        comment => "APT source for docker, based on current Debian distro" ;

      "packages"
        slist => {
                   "apt-transport-https",
                   "ca-certificates",
                   "docker-engine",
        },
        comment => "Packages to ensure on the system" ;

  methods:
      "report docker"
        usebundle => report("$(this.bundle)","bundle","INFO",
                            "Managing docker installation on this node") ;

      # Installation instructions from
      # https://docs.docker.com/engine/installation/linux/debian/
      "add_docker_repo_key"
        usebundle => add_local_apt_key("$(this.promise_dirname)/docker.key"),
        comment => "Ensure the apt key for docker is up to date" ;

      "configure docker's apt source"
        usebundle => add_apt_sourcefile("docker","@(manage_docker.source)"),
        comment => "Ensure the apt source for docker is up to date" ;

      "install docker"
        usebundle => install_packs("@(manage_docker.packages)"),
        comment => "Ensure all necessary packages are installed" ;

      "add users to docker group"
        usebundle => add_user_to_system_group("$(docker_users_list)",
                                              "docker"),
        comment => "Ensure user $(docker_users_list) in group docker" ;

      "watch docker service"
        usebundle => watch_service("docker",                 # service name
                                   "/usr/bin/docker daemon", # process name
                                   "up"),                    # desired state
        comment => "Ensure the docker service is up and running" ;
}

It’s basically the automation of the installation steps for Debian Jessie from the official documentation. Neat, isn’t it?


by bronto at June 18, 2016 09:48 PM

June 15, 2016

Cryptography Engineering

What is Differential Privacy?

Yesterday at the WWDC keynote, Apple announced a series of new security and privacy features, including one feature that's drawn a bit of attention -- and confusion. Specifically, Apple announced that they will be using a technique called "Differential Privacy" (henceforth: DP) to improve the privacy of their data collection practices.

The reaction to this by most people has been a big "???", since few people have even heard of Differential Privacy, let alone understand what it means. Unfortunately Apple isn't known for being terribly open when it comes to sharing the secret sauce that drives their platform, so we'll just have to hope that at some point they decide to publish more. What we know so far comes from Apple's iOS 10 Preview guide:
Starting with iOS 10, Apple is using Differential Privacy technology to help discover the usage patterns of a large number of users without compromising individual privacy. To obscure an individual’s identity, Differential Privacy adds mathematical noise to a small sample of the individual’s usage pattern. As more people share the same pattern, general patterns begin to emerge, which can inform and enhance the user experience. In iOS 10, this technology will help improve QuickType and emoji suggestions, Spotlight deep link suggestions and Lookup Hints in Notes.
To make a long story short, it sounds like Apple is going to be collecting a lot more data from your phone. They're mainly doing this to make their services better, not to collect individual users' usage habits. To guarantee this, Apple intends to apply sophisticated statistical techniques to ensure that this aggregate data -- the statistical functions it computes over all your information -- don't leak your individual contributions. In principle this sounds pretty good. But of course, the devil is always in the details.

While we don't have those details, this seems like a good time to at least talk a bit about what Differential Privacy is, how it can be achieved, and what it could mean for Apple -- and for your iPhone.

The motivation

In the past several years, "average people" have gotten used to the idea that they're sending a hell of a lot of personal information to the various services they use. Surveys also tell us they're starting to feel uncomfortable about it.

This discomfort makes sense when you think about companies using our personal data to market (to) us. But sometimes there are decent motivations for collecting usage information. For example, Microsoft recently announced a tool that can diagnose pancreatic cancer by monitoring your Bing queries. Google famously runs Google Flu Trends. And of course, we all benefit from crowdsourced data that improves the quality of the services we use -- from mapping applications to restaurant reviews.

Unfortunately, even well-meaning data collection can go bad. For example, in the late 2000s, Netflix ran a competition to develop a better film recommendation algorithm. To drive the competition, they released an "anonymized" viewing dataset that had been stripped of identifying information. Unfortunately, this de-identification turned out to be insufficient. In a well-known piece of work, Narayanan and Shmatikov showed that such datasets could be used to re-identify specific users -- and even predict their political affiliation! -- if you simply knew a little bit of additional information about a given user.

This sort of thing should be worrying to us. Not just because companies routinely share data (though they do) but because breaches happen, and because even statistics about a dataset can sometimes leak information about the individual records used to compute it. Differential Privacy is a set of tools that was designed to address this problem.

What is Differential Privacy?

Differential Privacy is a privacy definition that was originally developed by Dwork, Nissim, McSherry and Smith, with major contributions by many others over the years. Roughly speaking, what it states can summed up intuitively as follows:
Imagine you have two otherwise identical databases, one with your information in it, and one without it. Differential Privacy ensures that the probability that a statistical query will produce a given result is (nearly) the same whether it's conducted on the first or second database.
One way to look at this is that DP provides a way to know if your data has a significant effect on the outcome of a query. If it doesn't, then you might as well contribute to the database -- since there's almost no harm that can come of it. Consider a silly example:

Imagine that you choose to enable a reporting feature on your iPhone that tells Apple if you like to use the 💩  emoji routinely in your iMessage conversations. This report consists of a single bit of information: 1 indicates you like 💩 , and 0 doesn't. Apple might receive these reports and fill them into a huge database. At the end of the day, it wants to be able to derive a count of the users who like this particular emoji.

It goes without saying that the simple process of "tallying up the results" and releasing them does not satisfy the DP definition, since computing a sum on the database that contains your information will potentially produce a different result from computing the sum on a database without it. Thus, even though these sums may not seem to leak much information, they reveal at least a little bit about you. A key observation of the Differential Privacy research is that in many cases, DP can be achieved if the tallying party is willing to add random noise to the result. For example, rather than simply reporting the sum, the tallying party can inject noise from a Laplace or gaussian distribution, producing a result that's not quite exact -- but that masks the contents of any given row. (For other interesting functions, there are many other techniques as well.) 

Even more usefully, the calculation of "how much" noise to inject can be made without knowing the contents of the database itself (or even its size). That is, the noise calculation can be performed based only on knowledge of the function to be computed, and the acceptable amount of data leakage. 

A tradeoff between privacy and accuracy

Now obviously calculating the total number of 💩 -loving users on a system is a pretty silly example. The neat thing about DP is that the same overall approach can be applied to much more interesting functions, including complex statistical calculations like the ones used by Machine Learning algorithms. It can even be applied when many different functions are all computed over the same database.

But there's a big caveat here. Namely, while the amount of "information leakage" from a single query can be bounded by a small value, this value is not zero. Each time you query the database on some function, the total "leakage" increases -- and can never go down. Over time, as you make more queries, this leakage can start to add up.

This is one of the more challenging aspects of DP. It manifests in two basic ways:
  1. The more information you intend to "ask" of your database, the more noise has to be injected in order to minimize the privacy leakage. This means that in DP there is generally a fundamental tradeoff between accuracy and privacy, which can be a big problem when training complex ML models.
  2. Once data has been leaked, it's gone. Once you've leaked as much data as your calculations tell you is safe, you can't keep going -- at least not without risking your users' privacy. At this point, the best solution may be to just to destroy the database and start over. If such a thing is possible.
The total allowed leakage is often referred to as a "privacy budget", and it determines how many queries will be allowed (and how accurate the results will be). The basic lesson of DP is that the devil is in the budget. Set it too high, and you leak your sensitive data. Set it too low, and the answers you get might not be particularly useful.

Now in some applications, like many of the ones on our iPhones, the lack of accuracy isn't a big deal. We're used to our phones making mistakes. But sometimes when DP is applied in complex applications, such as training Machine Learning models, this really does matter.

Mortality vs. info disclosure, from Frederikson et al.
The red line is partient mortality.
To give an absolutely crazy example of how big the tradeoffs can be, consider this paper by Frederikson et al. from 2014. The authors began with a public database linking Warfarin dosage outcomes to specific genetic markers. They then used ML techniques to develop a dosing model based on their database -- but applied DP at various privacy budgets while training the model. Then they evaluated both the information leakage and the model's success at treating simulated "patients".

The results showed that the model's accuracy depends a lot on the privacy budget on which it was trained. If the budget is set too high, the database leaks a great deal of sensitive patient information -- but the resulting model makes dosing decisions that are about as safe as standard clinical practice. On the other hand, when the budget was reduced to a level that achieved meaningful privacy, the "noise-ridden" model had a tendency to kill its "patients". 

Now before you freak out, let me be clear: your iPhone is not going to kill you. Nobody is saying that this example even vaguely resembles what Apple is going to do on the phone. The lesson of this research is simply that there are interesting tradeoffs between effectiveness and the privacy protection given by any DP-based system -- these tradeoffs depend to a great degree on specific decisions made by the system designers, the parameters chosen by the deploying parties, and so on. Hopefully Apple will soon tell us what those choices are.

How do you collect the data, anyway?

You'll notice that in each of the examples above, I've assumed that queries are executed by a trusted database operator who has access to all of the "raw" underlying data. I chose this model because it's the traditional model used in most of the literature, not because it's a particularly great idea.

In fact, it would be worrisome if Apple was actually implementing their system this way. That would require Apple to collect all of your raw usage information into a massive centralized database, and then ("trust us!") calculate privacy-preserving statistics on it. At a minimum this would make your data vulnerable to subpoenas, Russian hackers, nosy Apple executives and so on.

Fortunately this is not the only way to implement a Differentially Private system. On the theoretical side, statistics can be computed using fancy cryptographic techniques (such as secure multi-party computation or fully-homomorphic encryption.) Unfortunately these techniques are probably too inefficient to operate at the kind of scale Apple needs. 

A much more promising approach is not to collect the raw data at all. This approach was recently pioneered by Google to collect usage statistics in their Chrome browser. The system, called RAPPOR, is based on an implementation of the 50-year old randomized response technique. Randomized response works as follows:
  1. When a user wants to report a piece of potentially embarrassing information (made up example: "Do you use Bing?"), they first flip a coin, and if the coin comes up "heads", they return a random answer -- calculated by flipping a second coin. Otherwise they answer honestly.
  2. The server then collects answers from the entire population, and (knowing the probability that the coins will come up "heads"), adjusts for the included "noise" to compute an approximate answer for the true response rate.
Intuitively, randomized response protects the privacy of individual user responses, because a "yes" result could mean that you use Bing, or it could just be the effect of the first mechanism (the random coin flip). More formally, randomized response has been shown to achieve Differential Privacy, with specific guarantees that can adjusted by fiddling with the coin bias. 

 I've met Craig Federighi. He actually
looks like this in person.
RAPPOR takes this relatively old technique and turns it into something much more powerful. Instead of simply responding to a single question, it can report on complex vectors of questions, and may even return complicated answers, such as strings -- e.g., which default homepage you use. The latter is accomplished by first encoding the string into a Bloom filter -- a bitstring constructed using hash functions in a very specific way. The resulting bits are then injected with noise, and summed, and the answers recovered using a (fairly complex) decoding process.

While there's no hard evidence that Apple is using a system like RAPPOR, there are some small hints. For example, Apple's Craig Federighi describes Differential Privacy as "using hashing, subsampling and noise injection to enable…crowdsourced learning while keeping the data of individual users completely private." That's pretty weak evidence for anything, admittedly, but presence of the "hashing" in that quote at least hints towards the use of RAPPOR-like filters.

The main challenge with randomized response systems is that they can leak data if a user answers the same question multiple times. RAPPOR tries to deal with this in a variety of ways, one of which is to identify static information and thus calculate "permanent answers" rather than re-randomizing each time. But it's possible to imagine situations where such protections could go wrong. Once again, the devil is very much in the details -- we'll just have to see. I'm sure many fun papers will be written either way.

So is Apple's use of DP a good thing or a bad thing?

As an academic researcher and a security professional, I have mixed feelings about Apple's announcement. On the one hand, as a researcher I understand how exciting it is to see research technology actually deployed in the field. And Apple has a very big field.

On the flipside, as security professionals it's our job to be skeptical -- to at a minimum demand people release their security-critical code (as Google did with RAPPOR), or at least to be straightforward about what it is they're deploying. If Apple is going to collect significant amounts of new data from the devices that we depend on so much, we should really make sure they're doing it right -- rather than cheering them for Using Such Cool Ideas. (I made this mistake already once, and I still feel dumb about it.)

But maybe this is all too "inside baseball". At the end of the day, it sure looks like Apple is honestly trying to do something to improve user privacy, and given the alternatives, maybe that's more important than anything else.  

by Matthew Green (noreply@blogger.com) at June 15, 2016 02:51 PM

June 11, 2016

Feeding the Cloud

Cleaning up obsolete config files on Debian and Ubuntu

As part of regular operating system hygiene, I run a cron job which updates package metadata and looks for obsolete packages and configuration files.

While there is already some easily available information on how to purge unneeded or obsolete packages and how to clean up config files properly in maintainer scripts, the guidance on how to delete obsolete config files is not easy to find and somewhat incomplete.

These are the obsolete conffiles I started with:

$ dpkg-query -W -f='${Conffiles}\n' | grep 'obsolete$'
 /etc/apparmor.d/abstractions/evince ae2a1e8cf5a7577239e89435a6ceb469 obsolete
 /etc/apparmor.d/tunables/ntpd 5519e4c01535818cb26f2ef9e527f191 obsolete
 /etc/apparmor.d/usr.bin.evince 08a12a7e468e1a70a86555e0070a7167 obsolete
 /etc/apparmor.d/usr.sbin.ntpd a00aa055d1a5feff414bacc89b8c9f6e obsolete
 /etc/bash_completion.d/initramfs-tools 7eeb7184772f3658e7cf446945c096b1 obsolete
 /etc/bash_completion.d/insserv 32975fe14795d6fce1408d5fd22747fd obsolete
 /etc/dbus-1/system.d/com.redhat.NewPrinterNotification.conf 8df3896101328880517f530c11fff877 obsolete
 /etc/dbus-1/system.d/com.redhat.PrinterDriversInstaller.conf d81013f5bfeece9858706aed938e16bb obsolete

To get rid of the /etc/bash_completion.d/ files, I first determined what packages they were registered to:

$ dpkg -S /etc/bash_completion.d/initramfs-tools
initramfs-tools: /etc/bash_completion.d/initramfs-tools
$ dpkg -S /etc/bash_completion.d/insserv
initramfs-tools: /etc/bash_completion.d/insserv

and then followed Paul Wise's instructions:

$ rm /etc/bash_completion.d/initramfs-tools /etc/bash_completion.d/insserv
$ apt install --reinstall initramfs-tools insserv

For some reason that didn't work for the /etc/dbus-1/system.d/ files and I had to purge and reinstall the relevant package:

$ dpkg -S /etc/dbus-1/system.d/com.redhat.NewPrinterNotification.conf
system-config-printer-common: /etc/dbus-1/system.d/com.redhat.NewPrinterNotification.conf
$ dpkg -S /etc/dbus-1/system.d/com.redhat.PrinterDriversInstaller.conf
system-config-printer-common: /etc/dbus-1/system.d/com.redhat.PrinterDriversInstaller.conf

$ apt purge system-config-printer-common
$ apt install system-config-printer

The files in /etc/apparmor.d/ were even more complicated to deal with because purging the packages that they come from didn't help:

$ dpkg -S /etc/apparmor.d/abstractions/evince
evince: /etc/apparmor.d/abstractions/evince
$ apt purge evince
$ dpkg-query -W -f='${Conffiles}\n' | grep 'obsolete$'
 /etc/apparmor.d/abstractions/evince ae2a1e8cf5a7577239e89435a6ceb469 obsolete
 /etc/apparmor.d/usr.bin.evince 08a12a7e468e1a70a86555e0070a7167 obsolete

I was however able to get rid of them by also purging the apparmor profile packages that are installed on my machine:

$ apt purge apparmor-profiles apparmor-profiles-extra evince ntp
$ apt install apparmor-profiles apparmor-profiles-extra evince ntp

Not sure why I had to do this but I suspect that these files used to be shipped by one of the apparmor packages and then eventually migrated to the evince and ntp packages directly and dpkg got confused.

If you're in a similar circumstance, you want want to search for the file you're trying to get rid of on Google and then you might end up on http://apt-browse.org/ which could lead you to the old package that used to own this file.

June 11, 2016 09:40 PM

June 08, 2016

Feeding the Cloud

Simple remote mail queue monitoring

In order to monitor some of the machines I maintain, I rely on a simple email setup using logcheck. Unfortunately that system completely breaks down if mail delivery stops.

This is the simple setup I've come up with to ensure that mail doesn't pile up on the remote machine.

Server setup

The first thing I did on the server-side is to follow Sean Whitton's advice and configure postfix so that it keeps undelivered emails for 10 days (instead of 5 days, the default):

postconf -e maximal_queue_lifetime=10d

Then I created a new user:

adduser mailq-check

with a password straight out of pwgen -s 32.

I gave ssh permission to that user:

adduser mailq-check sshuser

and then authorized my new ssh key (see next section):

sudo -u mailq-check -i
mkdir ~/.ssh/
cat - > ~/.ssh/authorized_keys

Laptop setup

On my laptop, the machine from where I monitor the server's mail queue, I first created a new password-less ssh key:

ssh-keygen -t ed25519 -f .ssh/egilsstadir-mailq-check
cat ~/.ssh/egilsstadir-mailq-check.pub

which I then installed on the server.

Then I added this cronjob in /etc/cron.d/egilsstadir-mailq-check:

0 2 * * * francois /usr/bin/ssh -i /home/francois/.ssh/egilsstadir-mailq-check mailq-check@egilsstadir mailq | grep -v "Mail queue is empty"

and that's it. I get a (locally delivered) email whenever the mail queue on the server is non-empty.

There is a race condition built into this setup since it's possible that the server will want to send an email at 2am. However, all that does is send a spurious warning email in that case and so it's a pretty small price to pay for a dirt simple setup that's unlikely to break.

June 08, 2016 05:30 AM

June 06, 2016

Carl Chenet

My Free Activities in May 2016

Trying to catch up with my blog posts about My Free Activities. This blog post will tell you about my free activities from January to May 2016.

1. Personal projects

2. Journal du hacker

That’s all folks! See you next month!

by Carl Chenet at June 06, 2016 10:00 PM

June 03, 2016

Security Monkey

May 30, 2016

Pantz.org

SaltStack Minion communication and missing returns

Setting up SaltStack is a fairly easy task. There is plenty of documentation here. This is not an install tutorial, this is an explanation and trouble shooting of what is going on with SaltStack Master and Minion communication. Mostly when using the CLI to send commands from the Master to the Minions.

Basic Check List

After you have installed your Salt Master and your Salt Minions software the first thing to do after starting your Master is open your Minion's config file in /etc/salt/minion and fill out the line "master: " to tell the Minion where his Master is. Then start/restart your Salt Minion. Do this for all your Minions.

Go back to the Master and accept all of of the Minions keys. See here on how to do this. If you don't see a certain Minions key here are some things you should check.

  1. Is your Minion and Master running the same software version? The Master can usually work at a higher version. Try to keep them the same if possible.
  2. Is your salt-Minion service running? Make sure it is set to run on start as well.
  3. Has the Minions key been accepted by the Master? If you don't even see a key request from the Minion then the Minion is not even talking to the Master .
  4. Does the Minion have an unobstructed network path back to TCP port 4505 on the Master? The Minions initialize a TCP connection back to the Master so they don't need any ports open. Watch out for those Firewalls.
  5. Check your Minions log file in /var/log/salt/minion for key issues or any other issues.

Basic Communication

Now lets say you have all of basic network and and key issues worked out and would like to send some jobs to your Minions. You can do this via the Salt CLI. Something like salt \* cmd.run 'echo HI'. This is considered a job by Salt. The Minions get this request and run the command and return the job information to the Master. The CLI talks to the Master who is listening for the return messages as they are coming in on the ZMQ bus. The CLI then reports back that status and output of the job.

That is a basic view of this process. But, sometimes Minions don't return job information. Then you ask yourself what the heck happened. You know the Minion is running fine. Eventually you find out you don't really understand Minion Master job communication at all.

Detailed Breakdown of Master Minion CLI Communication

By default when the job information gets returned to the Master and is stored on disk in the job cache. We will assume this is the case below.

The Salt CLI is just an small bit of code that interfaces with the API SaltStack has written that allows anyone to send commands to the Minions programmatically. The CLI is not connected directly to the Minions when the job request is made. When the CLI makes a job request, is handed to the Master to fulfill.

There are 2 key timeout periods you need be aware of before we go into a explanation of how a job request is handled. They are "timeout" and "gather_job_timeout".

  • In the Master config the "timeout:" setting tells the Master how often to poll the Minions about their job status. After the timeout period expires the Master fires off a "find_job" query. If you did this on the command line it would look like salt \* saltutil.find_job <jid> to all the Minions that were targeted. This asks all of the Minions if they are still running the job they were assigned. This setting can also be set from the CLI with the "-t" flag. Do not make the mistake that this timeout is how long the CLI command will wait until it times out and kills itself. The default value for "timeout" is 5 seconds.
  • In the Master config the "gather_job_timeout" setting defines how long the Master waits for a Minion to respond to the "find_job" query issued by the "timeout" setting mentioned above. If a Minion does not respond in the defined gather_job_timeout time then it is marked by the Master as "non responsive" for that polling period. This setting can not be set from the CLI. It can only be set in the Master config file. The default was 5 seconds, but current configs show it might be 10 seconds now.

When the CLI command is issued, the Master gathers a list of Minions with valid keys so it knows which Minions are on the system. It validates and filters the targeting information from the given target list and sets that as its list (targets) of Minions for the job. Now the Master has a list of who should return information when queried. The Master takes the requested command, target list, job id, and a few pieces of info, and broadcasts a message on the ZeroMQ bus to all of Minions. When all Minions get the message, they look at the target list and decide if they should execute the command or not. If the Minion sees he is in the target list he executes the command. If a Minion sees he is not part of the target list, he just ignores the message. The Minion that decided to run the command creates an local job id for the job and then performs the work.

When ever the Minion finishes job, it will return a message to the Master with the output of the job. The Master will mark that Minion off the target list as "Job Finished". While the Minions are working their jobs the Master is asking all of the Minions every 5 seconds (timeout) if they are still running the job they were assigned by providing them with the job id from the original job request. The Minion responds to "find job" request with a status of "still working" or "Job Finished". If a Minion does not respond to the request within the gather_job_timeout time period (5 secs), the Master marks the Minion as "non responsive" for the polling interval. All Minions will keep being queried on the polling interval until all of the responding Minions have been marked as responding with "Job Finished" as some point. Any Minion not responding during each of these intervals will keep being marked as "non responsive". After the last Minion that has been responding responds with "Job Finished", the Master polls the Minions one last time. Any Minion that has not responded in the final polling period is marked as "non responsive".

The CLI will show the output from the Minions as they finish their jobs. For the Minions that did not respond, but are connected to the Master, you will see the message "Minion did not return". If a Minion does not even look like it has a TCP connection with the Master, you will see "Minion did not return. [Not connected]".

By this time the Master should have marked the job as finished. The jobs info should now be available in the job cache. The above explanation is a high level explanation of how Master and Minions communicate. There are more details to this process than the above info, but this should give you a basic idea of how it works.

Takeaways From This Info

  1. There is no defined period on how long a job will take. The job will finish when the last responsive Minion has said it is done.
  2. If a Minion is not up or connected when a job request it sent out, then the Minion just misses that job. It is _not_ queued by the Master, and sent at a later time.
  3. Currently there is no hard timeout to force the Master to stop listening after a certain amount of time.
  4. If you set your timeout (-t) to be something silly like 3600, then if even one Minion is not responding the CLI will wait the full 3600 seconds to return. Beware!

Missing Returns

Sometimes you know there are Minions up and working, but you get "Minion did not return" or you did not see any info from the Minion at all before the CLI timed out. It is frustrating, as you can send the same Minion that just failed a job and it finishes it with no problem. There can be many reasons for this. Try/check the following things.

  • Did the Minion actually get the job request message? Start the Minion with log level "info" or higher. Then check the Minion log /var/log/salt/minion for job acceptance and completion.
  • After the CLI has exited, check job log. Use the jobs.list_jobs runner to find the job id, then list the output of the job with the jobs.lookup_jid runner. The CLI can miss returns, especially when the Master is overloaded. See the next bit about a overloaded Master.
  • The Master server getting overloaded is often the answer to missed returns. The Master is bound mostly by CPU and disk IO. Make sure these are not being starved. Bump up the threads in the Master config with the setting "worker_threads". Don't bump the threads past the amount of available CPU cores.
  • Bump your timeout (-t) values higher to give the Minions longer periods to finish in between asking for job status. You could also play with gather_job_timeout value in the Master to give the Minions more time to respond (they could be under heavy load) before marking them "non responsive" in between queries.

by Pantz.org at May 30, 2016 04:42 AM