Planet SysAdmin


June 25, 2017

Chris Siebenmann

One tradeoff in email system design is who holds problematic email

When you design parts of a mail system, for example a SMTP submission server that users will send their email out through or your external MX gateway for inbound email, you often face a choice of whether your systems should accept email aggressively or be conservative and leave email in the hands of the sender. For example, on a submission server should you accept email from users with destination addresses that you know are bad, or should you reject such addresses during the SMTP conversation?

In theory, the SMTP RFCs combined with best practices give you an unambiguous answer; here, the answer would be that clearly the submission server should reject known-bad addresses at SMTP time. In practice things are not so simple; generally you want problematic email handled by the system that can do the best job of dealing with it. For instance, you may be extremely dubious about how well your typical mail client (MUA) will handle things like permanent SMTP rejections on RCPT TO addresses, or temporary deferrals in general. In this case it can make a lot of sense to have the submission machine accept almost everything and sort it out later, sending explicit bounce messages to users if addresses fail. That way at least you know that users will get definite notification that certain addresses failed.

A similar tradeoff applies on your external MX gateway. You could insist on 'cut-through routing', where you don't say 'yes' during the initial SMTP conversation until the mail has been delivered all the way to its eventual destination; if there's a problem at some point, you give a temporary failure and the sender's MTA holds on to the message. Or you could feel it's better for your external MX gateway to hold inbound email when there's some problem with the rest of your mail system, because that way you can strongly control stuff like how fast email is retried and when it times out.

Our current mail system (which is mostly described here) has generally been biased towards holding the email ourselves. In the case of our user submission machines this was an explicit decision because at the time we felt we didn't trust mail clients enough. Our external MX gateway accepted all valid local destinations for multiple reasons, but a sufficient one is that Exim didn't support 'cut-through routing' at the time so we had no choice. These choices are old ones, and someday we may revisit some of them. For example, perhaps mail clients today have perfectly good handling of permanent failures on RCPT TO addresses.

(A accept, store, and forward model exposes some issues you might want to think about, but that's a separate concern.)

(We haven't attempted to test current mail clients, partly because there are so many of them. 'Accept then bounce' also has the benefit that it's conservative; it works with anything and everything, and we know exactly what users are going to get.)

by cks at June 25, 2017 05:01 AM

Errata Security

A kindly lesson for you non-techies about encryption

The following tweets need to be debunked:



The answer to John Schindler's question is:
every expert in cryptography doesn't know this
Oh, sure, you can find fringe wacko who also knows crypto that agrees with you but all the sane members of the security community will not.


Telegram is not trustworthy because it's closed-source. We can't see how it works. We don't know if they've made accidental mistakes that can be hacked. We don't know if they've been bribed by the NSA or Russia to put backdoors in their program. In contrast, PGP and Signal are open-source. We can read exactly what the software does. Indeed, thousands of people have been reviewing their software looking for mistakes and backdoors.

Encryption works. Neither the NSA nor the Russians can break properly encrypted content. There's no such thing as "military grade" encryption that is better than consumer grade. There's only encryption that nobody can hack vs. encryption that your neighbor's teenage kid can easily hack. There's essentially nothing in between. Those scenes in TV/movies about breaking encryption is as realistic as sound in space: good for dramatic presentation, but not how things work in the real world.

In particular, end-to-end encryption works. Sure, in the past, such apps only encrypted as far as the server, so whoever ran the server could read your messages. Modern chat apps, though, are end-to-end: the servers have absolutely no ability to decrypt what's on them, unless they can get the decryption keys from the phones.

Thus, in contrast to what John Schindler says, we techies know that Russia authorities do not have access to Signal and PGP messages.


Snowden hatred has become the anti-vax of crypto. Sure, there's no particular reason to trust Snowden -- people should really stop treating him as some sort of privacy-Jesus. But there's no particular reason to distrust him, either. His bland statements on crypto are indistinguishable from any other crypto-enthusiast statements. If he's a Russian pawn, then so too is the bulk of the crypto community.


With all this said, using Signal doesn't make you perfectly safe. The person you are chatting with could be a secret agent. There could be cameras/microphones in the room where you are using the app. The Russians can also hack into your phone, and likewise eavesdrop on everything you do with the phone, regardless of which app you use. And they probably have hacked specific people's phones. On the other hand, if the NSA or Russians were widely hacking phones, we'd detect that this was happening. We haven't.

Signal is therefore not a guarantee of safety, because nothing is, and if your life depends on it, you can't trust any simple advice like "use Signal". But, for the bulk of us, it's pretty damn secure, and I trust neither the Russians nor the NSA are reading my Signal or PGP messages.


At first blush, this @20committee tweet appears to be non-experts opining on things outside their expertise. But in reality, it's just obtuse partisanship, where truth and expertise doesn't matter. Nothing you or I say can change some people's minds on this matter, no matter how much our expertise gives weight to our words. This post is instead for bystanders, who don't know enough to judge whether these crazy statements have merit.






Bonus:

So let's talk about "every crypto expert". It's, of course, impossible to speak for every crypto expert. It's like saying how the consensus among climate scientists is that mankind is warming the globe, while at the same time, ignoring the wide spread disagreement on how much warming that is.

The same is true here. You'll get a widespread different set of responses from experts about the above tweet. Some, for example, will stress my point at the bottom that hacking the endpoint (the phone) breaks all the apps, and thus justify the above tweet from that point of view. Others will point out that all software has bugs, and it's quite possible that Signal has some unknown bug that the Russians are exploiting.

So I'm not attempting to speak for what all experts might say here in the general case and what long lecture they can opine about. I am, though, pointing out the basics that virtually everyone agrees on, the consensus of open-source and working crypto.

by Robert Graham (noreply@blogger.com) at June 25, 2017 03:46 AM

June 24, 2017

Chris Siebenmann

In praise of uBlock Origin's new 'element zapper' feature

The latest versions of uBlock Origin have added a new feature, the element zapper. To quote the documentation:

The purpose of the element zapper is to quickly deal with the removal of nuisance elements on a page without having to create one or more filters.

uBlock Origin has always allowed you to permanently block page elements, and a while back I started using it aggressively to deal with the annoyances of modern websites. This is fine and works nicely, but it takes work. I have to carefully pick out what I want to target, maybe edit the CSS selector uBlock Origin has found, preview what I'm actually going to be blocking, and then I have a new permanent rule cluttering up my filters (and probably slightly growing Firefox's memory usage). This work is worth it for things that I'm going to visit regularly, but some combination of the amount of work required and the fact that I'd be picking up a new permanent rule made me not do it for pages I was basically just visiting once. And usually things weren't all that annoying.

Enter Medium and their obnoxious floating sharing bar at the bottom of pages. These things can be blocked on Medium's website itself with a straightforward rule, but the problem is that tons of people use Medium with custom domains. For example, this article that I linked to in a recent entry. These days it seems like every fourth article I read is on some Medium-based site (I exaggerate, but), and each of them have the Medium sharing bar, and each of them needs a new site-specific blocking rule unless I want to globally block all <divs> with the class js-stickyFooter (until Medium changes the name).

(Globally blocking such a <div> is getting really tempting, though. Medium feels like a plague at this point.)

The element zapper feature deals with this with no fuss or muss. If I wind up reading something on yet another site that's using Medium and has their floating bar, I can zap it away in seconds The same is true of any number of floating annoyances. And if I made a mistake and my zapping isn't doing what I want, it's easy to fix; since these are one-shot rules, I can just reload the page to start over from scratch. This has already started encouraging me to do away with even more things than before, and just like when I started blocking elements, I feel much happier when I'm reading the resulting pages.

(Going all the way to using Firefox's Reader mode is usually too much of a blunt hammer for most sites, and often I don't care quite that much.)

PS: Now that I think about it, I probably should switch all of my per-site blocks for Medium's floating bar over to a single '##div.js-stickyFooter' block. It's unlikely to cause any collateral damage and I suspect it would actually be more memory and CPU efficient.

(And I should probably check over my personal block rules in general, although I don't have too many of them.)

by cks at June 24, 2017 03:16 AM

Links: Git remote branches and Git's missing terminology (and more)

Mark Jason Dominus's Git remote branches and Git's missing terminology (via) is about what it sounds like. This is an issue of some interest to me, since I flailed around in the same general terminology swamp in a couple of recent entries. Dominus explains all of this nicely, with diagrams, and it helped reinforce things in my mind (and reassure me that I more or less understood what was going on).

He followed this up with Git's rejected push error, which covers a frequent 'git push' issue with the same thoroughness as his first article. I more or less knew this stuff, but again I found it useful to read through his explanation to make sure I actually knew as much as I thought I did.

by cks at June 24, 2017 02:30 AM

June 23, 2017

Evaggelos Balaskas

Visiting ProgressBar HackerSpace in Bratislava

When traveling, I make an effort to visit the local hackerspace. I understand that this is not normal behavior for many people, but for us (free / opensource advocates) is always a must.

This was my 4th week on Bratislava and for the first time, I had a couple free hours to visit ProgressBar HackerSpace.

For now, they are allocated in the middle of the historical city on the 2nd floor. The entrance is on a covered walkway (gallery) between two buildings. There is a bell to ring and automated (when members are already inside) the door is wide open for any visitor. No need to wait or explain why you are there!

Entering ProgressBar there is no doubt that you are entering a hackerspace.

ProgressBar

You can view a few photos by clicking here: ProgressBar - Photos

And you can find ProgressBar on OpenStreet Map

Some cool-notable projects:

  • bitcoin vending machine
  • robot arm to fetch clubmate
  • magic wood to switch on/off lights
  • blinkwall
  • Cool T-shirts

their lab is fool with almost anything you need to play/hack with.

I was really glad to make time and visit them.

June 23, 2017 11:34 AM

June 21, 2017

Vincent Bernat

IPv4 route lookup on Linux

TL;DR: With its implementation of IPv4 routing tables using LPC-tries, Linux offers good lookup performance (50 ns for a full view) and low memory usage (64 MiB for a full view).


During the lifetime of an IPv4 datagram inside the Linux kernel, one important step is the route lookup for the destination address through the fib_lookup() function. From essential information about the datagram (source and destination IP addresses, interfaces, firewall mark, …), this function should quickly provide a decision. Some possible options are:

  • local delivery (RTN_LOCAL),
  • forwarding to a supplied next hop (RTN_UNICAST),
  • silent discard (RTN_BLACKHOLE).

Since 2.6.39, Linux stores routes into a compressed prefix tree (commit 3630b7c050d9). In the past, a route cache was maintained but it has been removed1 in Linux 3.6.

Route lookup in a trie

Looking up a route in a routing table is to find the most specific prefix matching the requested destination. Let’s assume the following routing table:

$ ip route show scope global table 100
default via 203.0.113.5 dev out2
192.0.2.0/25
        nexthop via 203.0.113.7  dev out3 weight 1
        nexthop via 203.0.113.9  dev out4 weight 1
192.0.2.47 via 203.0.113.3 dev out1
192.0.2.48 via 203.0.113.3 dev out1
192.0.2.49 via 203.0.113.3 dev out1
192.0.2.50 via 203.0.113.3 dev out1

Here are some examples of lookups and the associated results:

Destination IP Next hop
192.0.2.49 203.0.113.3 via out1
192.0.2.50 203.0.113.3 via out1
192.0.2.51 203.0.113.7 via out3 or 203.0.113.9 via out4 (ECMP)
192.0.2.200 203.0.113.5 via out2

A common structure for route lookup is the trie, a tree structure where each node has its parent as prefix.

Lookup with a simple trie

The following trie encodes the previous routing table:

Simple routing trie

For each node, the prefix is known by its path from the root node and the prefix length is the current depth.

A lookup in such a trie is quite simple: at each step, fetch the nth bit of the IP address, where n is the current depth. If it is 0, continue with the first child. Otherwise, continue with the second. If a child is missing, backtrack until a routing entry is found. For example, when looking for 192.0.2.50, we will find the result in the corresponding leaf (at depth 32). However for 192.0.2.51, we will reach 192.0.2.50/31 but there is no second child. Therefore, we backtrack until the 192.0.2.0/25 routing entry.

Adding and removing routes is quite easy. From a performance point of view, the lookup is done in constant time relative to the number of routes (due to maximum depth being capped to 32).

Quagga is an example of routing software still using this simple approach.

Lookup with a path-compressed trie

In the previous example, most nodes only have one child. This leads to a lot of unneeded bitwise comparisons and memory is also wasted on many nodes. To overcome this problem, we can use path compression: each node with only one child is removed (except if it also contains a routing entry). Each remaining node gets a new property telling how many input bits should be skipped. Such a trie is also known as a Patricia trie or a radix tree. Here is the path-compressed version of the previous trie:

Patricia trie

Since some bits have been ignored, on a match, a final check is executed to ensure all bits from the found entry are matching the input IP address. If not, we must act as if the entry wasn’t found (and backtrack to find a matching prefix). The following figure shows two IP addresses matching the same leaf:

Lookup in a Patricia trie

The reduction on the average depth of the tree compensates the necessity to handle those false positives. The insertion and deletion of a routing entry is still easy enough.

Many routing systems are using Patricia trees:

Lookup with a level-compressed trie

In addition to path compression, level compression2 detects parts of the trie that are densily populated and replace them with a single node and an associated vector of 2k children. This node will handle k input bits instead of just one. For example, here is a level-compressed version our previous trie:

Level-compressed trie

Such a trie is called LC-trie or LPC-trie and offers higher lookup performances compared to a radix tree.

An heuristic is used to decide how many bits a node should handle. On Linux, if the ratio of non-empty children to all children would be above 50% when the node handles an additional bit, the node gets this additional bit. On the other hand, if the current ratio is below 25%, the node loses the responsibility of one bit. Those values are not tunable.

Insertion and deletion becomes more complex but lookup times are also improved.

Implementation in Linux

The implementation for IPv4 in Linux exists since 2.6.13 (commit 19baf839ff4a) and is enabled by default since 2.6.39 (commit 3630b7c050d9).

Here is the representation of our example routing table in memory3:

Memory representation of a trie

There are several structures involved:

The trie can be retrieved through /proc/net/fib_trie:

$ cat /proc/net/fib_trie
Id 100:
  +-- 0.0.0.0/0 2 0 2
     |-- 0.0.0.0
        /0 universe UNICAST
     +-- 192.0.2.0/26 2 0 1
        |-- 192.0.2.0
           /25 universe UNICAST
        |-- 192.0.2.47
           /32 universe UNICAST
        +-- 192.0.2.48/30 2 0 1
           |-- 192.0.2.48
              /32 universe UNICAST
           |-- 192.0.2.49
              /32 universe UNICAST
           |-- 192.0.2.50
              /32 universe UNICAST
[...]

For internal nodes, the numbers after the prefix are:

  1. the number of bits handled by the node,
  2. the number of full children (they only handle one bit),
  3. the number of empty children.

Moreover, if the kernel was compiled with CONFIG_IP_FIB_TRIE_STATS, some interesting statistics are available in /proc/net/fib_triestat4:

$ cat /proc/net/fib_triestat
Basic info: size of leaf: 48 bytes, size of tnode: 40 bytes.
Id 100:
        Aver depth:     2.33
        Max depth:      3
        Leaves:         6
        Prefixes:       6
        Internal nodes: 3
          2: 3
        Pointers: 12
Null ptrs: 4
Total size: 1  kB
[...]

When a routing table is very dense, a node can handle many bits. For example, a densily populated routing table with 1 million entries packed in a /12 can have one internal node handling 20 bits. In this case, route lookup is essentially reduced to a lookup in a vector.

The following graph shows the number of internal nodes used relative to the number of routes for different scenarios (routes extracted from an Internet full view, /32 routes spreaded over 4 different subnets with various densities). When routes are densily packed, the number of internal nodes are quite limited.

Internal nodes and null pointers

Performance

So how performant is a route lookup? The maximum depth stays low (about 6 for a full view), so a lookup should be quite fast. With the help of a small kernel module, we can accurately benchmark5 the fib_lookup() function:

Maximum depth and lookup time

The lookup time is loosely tied to the maximum depth. When the routing table is densily populated, the maximum depth is low and the lookup times are fast.

When forwarding at 10 Gbps, the time budget for a packet would be about 50 ns. Since this is also the time needed for the route lookup alone in some cases, we wouldn’t be able to forward at line rate with only one core. Nonetheless, the results are pretty good and they are expected to scale linearly with the number of cores.

Another interesting figure is the time it takes to insert all those routes into the kernel. Linux is also quite efficient in this area since you can insert 2 million routes in less than 10 seconds:

Insertion time

Memory usage

The memory usage is available directly in /proc/net/fib_triestat. The statistic provided doesn’t account for the fib_info structures, but you should only have a handful of them (one for each possible next-hop). As you can see on the graph below, the memory use is linear with the number of routes inserted, whatever the shape of the routes is.

Memory usage

The results are quite good. With only 256 MiB, about 2 million routes can be stored!

Routing rules

Unless configured without CONFIG_IP_MULTIPLE_TABLES, Linux supports several routing tables and has a system of configurable rules to select the table to use. These rules can be configured with ip rule. By default, there are three of them:

$ ip rule show
0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default

Linux will first lookup for a match in the local table. If it doesn’t find one, it will lookup in the main table and at last resort, the default table.

Builtin tables

The local table contains routes for local delivery:

$ ip route show table local
broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1
broadcast 192.168.117.0 dev eno1 proto kernel scope link src 192.168.117.55
local 192.168.117.55 dev eno1 proto kernel scope host src 192.168.117.55
broadcast 192.168.117.63 dev eno1 proto kernel scope link src 192.168.117.55

This table is populated automatically by the kernel when addresses are configured. Let’s look at the three last lines. When the IP address 192.168.117.55 was configured on the eno1 interface, the kernel automatically added the appropriate routes:

  • a route for 192.168.117.55 for local unicast delivery to the IP address,
  • a route for 192.168.117.255 for broadcast delivery to the broadcast address,
  • a route for 192.168.117.0 for broadcast delivery to the network address.

When 127.0.0.1 was configured on the loopback interface, the same kind of routes were added to the local table. However, a loopback address receives a special treatment and the kernel also adds the whole subnet to the local table. As a result, you can ping any IP in 127.0.0.0/8:

$ ping -c1 127.42.42.42
PING 127.42.42.42 (127.42.42.42) 56(84) bytes of data.
64 bytes from 127.42.42.42: icmp_seq=1 ttl=64 time=0.039 ms

--- 127.42.42.42 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.039/0.039/0.039/0.000 ms

The main table usually contains all the other routes:

$ ip route show table main
default via 192.168.117.1 dev eno1 proto static metric 100
192.168.117.0/26 dev eno1 proto kernel scope link src 192.168.117.55 metric 100

The default route has been configured by some DHCP daemon. The connected route (scope link) has been automatically added by the kernel (proto kernel) when configuring an IP address on the eno1 interface.

The default table is empty and has little use. It has been kept when the current incarnation of advanced routing has been introduced in Linux 2.1.68 after a first tentative using “classes” in Linux 2.1.156.

Performance

Since Linux 4.1 (commit 0ddcf43d5d4a), when the set of rules is left unmodified, the main and local tables are merged and the lookup is done with this single table (and the default table if not empty). Without specific rules, there is no performance hit when enabling the support for multiple routing tables. However, as soon as you add new rules, some CPU cycles will be spent for each datagram to evaluate them. Here is a couple of graphs demonstrating the impact of routing rules on lookup times:

Routing rules impact on performance

For some reason, the relation is linear when the number of rules is between 1 and 100 but the slope increases noticeably past this threshold. The second graph highlights the negative impact of the first rule (about 30 ns).

A common use of rules is to create virtual routers: interfaces are segregated into domains and when a datagram enters through an interface from domain A, it should use routing table A:

# ip rule add iif vlan457 table 10
# ip rule add iif vlan457 blackhole
# ip rule add iif vlan458 table 20
# ip rule add iif vlan458 blackhole

The blackhole rules may be removed if you are sure there is a default route in each routing table. For example, we add a blackhole default with a high metric to not override a regular default route:

# ip route add blackhole default metric 9999 table 10
# ip route add blackhole default metric 9999 table 20
# ip rule add iif vlan457 table 10
# ip rule add iif vlan458 table 20

To reduce the impact on performance when many interface-specific rules are used, interfaces can be attached to VRF instances and a single rule can be used to select the appropriate table:

# ip link add vrf-A type vrf table 10
# ip link set dev vrf-A up
# ip link add vrf-B type vrf table 20
# ip link set dev vrf-B up
# ip link set dev vlan457 master vrf-A
# ip link set dev vlan458 master vrf-B
# ip rule show
0:      from all lookup local
1000:   from all lookup [l3mdev-table]
32766:  from all lookup main
32767:  from all lookup default

The special l3mdev-table rule was automatically added when configuring the first VRF interface. This rule will select the routing table associated to the VRF owning the input (or output) interface.

VRF was introduced in Linux 4.3 (commit 193125dbd8eb), the performance was greatly enhanced in Linux 4.8 (commit 7889681f4a6c) and the special routing rule was also introduced in Linux 4.8 (commit 96c63fa7393d, commit 1aa6c4f6b8cd). You can find more details about it in the kernel documentation.

Conclusion

The takeaways from this article are:

  • route lookup times hardly increase with the number of routes,
  • densily packed /32 routes lead to amazingly fast route lookups,
  • memory use is low (128 MiB par million routes),
  • no optimization is done on routing rules.

  1. The routing cache was subject to reasonably easy to launch denial of service attacks. It was also believed to not be efficient for high volume sites like Google but I have first-hand experience it was not the case for moderately high volume sites. 

  2. IP-address lookup using LC-tries”, IEEE Journal on Selected Areas in Communications, 17(6):1083-1092, June 1999. 

  3. For internal nodes, the key_vector structure is embedded into a tnode structure. This structure contains information rarely used during lookup, notably the reference to the parent that is usually not needed for backtracking as Linux keeps the nearest candidate in a variable. 

  4. One leaf can contain several routes (struct fib_alias is a list). The number of “prefixes” can therefore be greater than the number of leaves. The system also keeps statistics about the distribution of the internal nodes relative to the number of bits they handle. In our example, all the three internal nodes are handling 2 bits. 

  5. The measurements are done in a virtual machine with one vCPU. The host is an Intel Core i5-4670K running at 3.7 GHz during the experiment (CPU governor was set to performance). The kernel is Linux 4.11. The benchmark is single-threaded. It runs a warm-up phase, then executes about 100,000 timed iterations and keeps the median. Timings of individual runs are computed from the TSC

  6. Fun fact: the documentation of this first tentative of more flexible routing is still available in today’s kernel tree and explains the usage of the “default class”

by Vincent Bernat at June 21, 2017 09:19 AM

Errata Security

Some notes on #MacronLeak

Tonight (Friday May 5 2017) hackers dumped emails (and docs) related to French presidential candidate Emmanuel Macron. He's the anti-Putin candidate running against the pro-Putin Marin Le Pen. I thought I'd write up some notes.


Are they Macron's emails?

No. They are e-mails from members of his staff/supporters, namely Alain Tourret, Pierre Person, Cedric O??, Anne-Christine Lang, and Quentin Lafay.

There are some documents labeled "Macron" which may have been taken from his computer, cloud drive -- his own, or an assistant.


Who done it?

Obviously, everyone assumes that Russian hackers did it, but there's nothing (so far) that points to anybody in particular.

It appears to be the most basic of phishing attacks, which means anyone could've done it, including your neighbor's pimply faced teenager.

Update: Several people [*] have pointed out Trend Micro reporting that Russian/APT28 hackers were targeting Macron back on April 24. Coincidentally, this is also the latest that emails appear in the dump.


What's the hacker's evil plan?

Everyone is proposing theories about the hacker's plan, but the most likely answer is they don't have one. Hacking is opportunistic. They likely targeted everyone in the campaign, and these were the only victims they could hack. It's probably not the outcome they were hoping for.

But since they've gone through all the work, it'd be a shame to waste it. Thus, they are likely releasing the dump not because they believe it will do any good, but because it'll do them no harm. It's a shame to waste all the work they put into it.

If there's any plan, it's probably a long range one, serving notice that any political candidate that goes against Putin will have to deal with Russian hackers dumping email.


Why now? Why not leak bits over time like with Clinton?

France has a campaign blackout starting tonight at midnight until the election on Sunday. Thus, it's the perfect time to leak the files. Anything salacious, or even rumors of something bad, will spread viraly through Facebook and Twitter, without the candidate or the media having a good chance to rebut the allegations.

The last emails in the logs appear to be from April 24, the day after the first round vote (Sunday's vote is the second, runoff, round). Thus, the hackers could've leaked this dump any time in the last couple weeks. They chose now to do it.


Are the emails verified?

Yes and no.

Yes, we have DKIM signatures between people's accounts, so we know for certain that hackers successfully breached these accounts. DKIM is an anti-spam method that cryptographically signs emails by the sending domain (e.g. @gmail.com), and thus, can also verify the email hasn't been altered or forged.

But no, when a salacious email or document is found in the dump, it'll likely not have such a signature (most emails don't), and thus, we probably won't be able to verify the scandal. In other words, the hackers could have altered or forged something that becomes newsworthy.


What are the most salacious emails/files?

I don't know. Before this dump, hackers on 4chan were already making allegations that Macron had secret offshore accounts (debunked). Presumably we need to log in to 4chan tomorrow for them to point out salacious emails/files from this dump.

Another email going around seems to indicate that Alain Tourret, a member of the French legislature, had his assistant @FrancoisMachado buy drugs online with Bitcoin and had them sent to his office in the legislature building. The drugs in question, 3-MMC, is a variant of meth that might be legal in France. The emails point to a tracking number which looks legitimate, at least, that a package was indeed shipped to that area of Paris. There is a bitcoin transaction that matches the address, time, and amount specified in the emails. Some claim these drug emails are fake, but so far, I haven't seen any emails explaining why they should be fake. On the other hand, there's nothing proving they are true (no DKIM sig), either.

Some salacious emails might be obvious, but some may take people with more expertise to find. For example, one email is a receipt from Uber (with proper DKIM validation) that shows the route that "Quenten" took on the night of the first round election. Somebody clued into the French political scene might be able to figure out he's visiting his mistress, or something. (This is hypothetical -- in reality, he's probably going from one campaign rally to the next).


What's the Macron camp's response?

They have just the sort of response you'd expect.

They claim some of the documents/email are fake, without getting into specifics. They claim that information is needed to be understand in context. They claim that this was a "massive coordinated attack", even though it's something that any pimply faced teenager can do. They claim it's an attempt to destabilize democracy. They call upon journalists to be "responsible".


by Robert Graham (noreply@blogger.com) at June 21, 2017 02:04 AM

June 20, 2017

Evaggelos Balaskas

No Place To Hide

noplacetohide.jpg

An Amazing Book!!!

Must Read !!

I’ve listened to the audiobook like in two days.
Couldnt leave it down.

Then organize a CryptoParty to your local hackerspace

Tag(s): books

June 20, 2017 10:10 PM

Sean's IT Blog

Introducing Horizon 7.2

What’s new in Horizon 7.2?  A lot, actually.  There are some big features that will impact customers greatly, some beta features are now general availability, and some scalability enhancements. 

Horizon 7 Helpdesk Tool

One of the big drawbacks of Horizon is that it doesn’t have a very good tool for Helpdesk staff to support the environment.  While Horizon Administrator has role-based access control to limit what non-administrators can access and change, it doesn’t provide a good tool to enable a helpdesk technician to look up what desktop a user is in, grab some key metrics about their session, and remote in to assist the user.

VMware has provided a fling that can do some of this – the Horizon Toolbox.  But flings aren’t officially supported and aren’t always updated to support the latest version. 

Horizon 7.2 addresses this issue with the addition of the Horizon Helpdesk tool.  Horizon Helpdesk is an HTML5-based web interface designed for level 1 and 2 support desk engineers.  Unlike Citrix Director, which offers similar features for XenApp and XenDesktop environments, Horizon Helpdesk is integrated into the Connection Servers and installed by default.

Horizon Helpdesk provides a tool for helpdesk technicians who are supporting users with Horizon virtual desktops and published applications.  They’re able to log into the web-based tool, search for specific users, and view their active sessions and desktop and application entitlements.  Technicians can view details about existing sessions, including various metrics from within the virtual desktops and use the tool to launch a remote assistance.

Skype for Business Support is Now GA

The Horizon Virtualization Pack for Skype for Business was released under Technical Preview with Horizon 7.1.  It’s now fully supported on Windows endpoints in Horizon 7.2, and a private beta will be available for Linux-based clients soon. 

The Virtualization Pack allows Horizon clients to offload some of the audio and video features of Skype for business to the endpoint, and it provides optimized communication paths directly between the endpoints for multimedia calls.  What exactly does that mean?  Well, prior to the virtualization pack, multimedia streams would need to be sent from the endpoint to the virtual desktop, processed inside the VM, and then sent to the recipient.  The virtualization pack offloads the multimedia processing to the endpoint, and all media is sent between endpoints using a separate point-to-point stream.  This happens outside of the normal display protocol virtual channels.

image

Skype for Business has a number of PBX features, but not all of these features are supported in the first production-ready version of this plugin.  The following features are not supported:

  • Multiparty Conferencing
  • E911 Location Services
  • Call to Response Group
  • Call Park and Call Pickup from Park
  • Call via X (Work/Home, Cell, etc)
  • Customized Ringtones
  • Meet Now Conferencing
  • Call Recording

Instant Clone Updates

A few new features have been added to Horizon Instant Clones.  These new features are the ability to reuse existing Active Directory Computer accounts, support for configuring SVGA settings – such as video memory, resolution and number of monitors supported –  on the virtual desktop parent, and support for placing Instant Clones on local datastores.

Scalability Updates

VMware has improved Horizon stability in each of the previous releases, usually by expanding the number of connections, Horizon Pods, and sites that are supported in a Cloud Pod Architecture.  This release is no exception.  CPA now supports up to 120,000 sessions across 12 Horizon Pods in five sites.  While the number of sites has not increased, the number of supported sessions has increased by 40,000 compared to Horizon 7.1.

This isn’t the only scalability enhancement in Horizon 7.2.  The largest enhancement is in the number of desktops that will be supported per vCenter Server.  Prior to this version of Horizon, only 2000 desktops of any kind were supported per vCenter.  This means that a Horizon Pod that supported that maximum number of sessions would require five vCenter servers.  Horizon 7.2 doubles the number of supported desktops per vCenter, meaning that each vCenter Server can now support up to 4000 desktops of any type. 

Other New Features

Some of the other new features in Horizon 7.2 are:

  • Support for deploying Full Clone desktops to DRS Storage Clusters
  • A Workspace One mode configure access policies around desktops and published applications
  • Client Drive Redirection and USB Redirection are now supported on Linux
  • The ability to rebuild full clone desktops and reuse existing machine accounts.

Key Takeaways

The major features of Horizon 7.2 are very attractive, and they address things that customers have been asking for a while.  If you’re planning a greenfield environment, or it’s been a while since you’ve upgraded, there are a number of compelling reasons to look at implementing this version.


by seanpmassey at June 20, 2017 05:17 PM

ma.ttias.be

MariaDB MaxScale 2.1 defaulting to IPv6

The post MariaDB MaxScale 2.1 defaulting to IPv6 appeared first on ma.ttias.be.

This little upgrade caught me by surprise. In a MaxScale 2.0 to 2.1 upgrade, MaxScale changes the default bind address from IPv4 to IPv6. It's mentioned in the release notes as this;

MaxScale 2.1.2 added support for IPv6 addresses. The default interface that listeners bind to was changed from the IPv4 address 0.0.0.0 to the IPv6 address ::. To bind to the old IPv4 address, add address=0.0.0.0 to the listener definition.

Upgrading MariaDB MaxScale from 2.0 to 2.1

The result is pretty significant though, because authentication in MySQL is often host or IP based, with permissions being granted like this.

$ SET PASSWORD FOR 'xxx'@'10.0.0.1' = PASSWORD('your_password');

Notice the explicit use of IP address there.

Now, after a MariaDB 2.1 upgrade, it'll default to an IPv6 address for authentication, which gives you the following error message;

$ mysql -h127.0.0.1 -P 3306 -uxxx -pyour_password
ERROR 1045 (28000): Access denied for user 'xxx'@'::ffff:127.0.0.1' (using password: YES)

Notice how 127.0.0.1 turned into ::ffff:127.0.0.1? That's an IPv4 address being encapsulated in an IPv6 address. And it'll cause MySQL authentication to potentially fail, depending on how you assigned your users & permissions.

To fix, you can either;

  • Downgrade MaxScale from 2.1 back to 2.0 (add the 2.0.6 repositories for your OS and downgrade MaxScale)
  • Add the address=0.0.0.0 config the your listener configuration in /etc/maxscale.cnf

Hope this helps!

The post MariaDB MaxScale 2.1 defaulting to IPv6 appeared first on ma.ttias.be.

by Mattias Geniar at June 20, 2017 05:04 AM

June 19, 2017

Everything Sysadmin

Sarah Allen

customer-driven development

Sometimes when I talk about customer use cases for software that I’m building, it confuses the people I work with. What does that have to do with engineering? Is my product manager out on leave and I’m standing in?

I’ve found that customer stories connect engineers to the problems we’re solving. When we’re designing a system, we ask ourselves “what if” questions all the time. We need to explore the bounds of the solutions we’re creating, understanding the edge cases. We consider the performance characteristics, scalability requirements and all sorts of other important technical details. Through all this we imagine how the system will interact. It is easier when the software has a human interface, when it interacts with regular people. It is harder, but just as important, when we write software that only interacts with other software systems.

Sometimes the software that we are building can seem quite unrelated from the human world, but it isn’t at all. We then need to understand the bigger system we’re building. At some point, the software has a real-world impact, and we need to understand that, or we can miss creating the positive effects we intend.

On many teams over many years, I’ve had the opportunity to work with other engineers who get this. When it works there’s a rhythm to it, a heartbeat that I stop hearing consciously because it is always there. We talk to customers early and often. We learn about their problems, the other technologies they use, and what is the stuff they understand that we don’t. We start describing our ideas for solutions in their own words. This is not marketing. This influences what we invent. Understanding how to communicate about the thing we’re building changes what we build. We imagine this code we will write, that calls some other code, which causes other software to do a thing, and through all of the systems and layers there is some macro effect, which is important and time critical. Our software may have the capability of doing a thousand things, but we choose the scenarios for performance testing, we decide what is most normal, most routine, and that thing needs to be tied directly to those real effects.

Sometimes we refer to “domain knowledge” if our customers have special expertise, and we need to know that stuff, at least a little bit, so we can relate to our customers (and it’s usually pretty fun to explore someone else’s world a bit). However, the most important thing our customers know, that we need to discover, is what will solve their problems. They don’t know it in their minds — what they describe that they think will solve their problems often doesn’t actually. They know it when they feel it, when they somehow interact with our software and it gives them agency and amplifies their effect in the world.

Our software works when it empowers people to solve their problems and create impact. We can measure that. We can watch that happen. For those of us who have experienced that as software engineers, there’s no going back. We can’t write software any other way.

Customer stories, first hand knowledge from the people whose problems I’m solving spark my imagination, but I’m careful to use those stories with context from quantitative data. I love the product managers who tell me about rapidly expanding markets and how they relate to the use cases embedded in those stories, and research teams who validate whether the story I heard the other day is common across our customers or an edge case. We build software on a set of assumptions. Those assumptions need to be based on reality, and we need to validate early and often whether the thing we’re making is actually going to have a positive effect on reality.

Customer stories are like oxygen to a development team that works like this. Research and design teams who work closely with product managers are like water. When other engineers can’t explain the customer use cases for their software, when they argue about what the solution should be based only on the technical requirements, sometimes I can’t breathe. Then I find the people who are like me, who work like this, and I can hear a heartbeat, I can breathe again, and if feels like this thing we are making just might be awesome.

by sarah at June 19, 2017 01:08 PM

June 18, 2017

Debian Administration

Debian Stretch Released

Today the Debian project is pleased to announce the release of the next stable release of Debian GNU/Linux, code-named Stretch.

by Steve at June 18, 2017 03:44 AM

June 17, 2017

OpenSSL

Removing Some Code

This is another update on our effort to re-license the OpenSSL software. Our previous post in March was about the launch of our effort to reach all contributors, with the hope that they would support this change.

So far, about 40% of the people have responded. For a project that is as old as OpenSSL (including its predecessor, SSLeay, it’s around 20 years) that’s not bad. We’ll be continuing our efforts over the next couple of months to contact everyone.

Of those people responding, the vast majority have been in favor of the license change – less then a dozen objected. This post describes what we’re doing about those and how we came to our conclusions. The goal is to be very transparent and open with our processes.

First, we want to mention that we respect their decision. While it is inconvenient to us, we acknowledge their rights to keep their code under the terms that they originally agreed to. We are asking permission to change the license terms, and it is fully within their rights to say no.

The license website is driven by scripts that are freely available in the license section of our tools repository on GitHub. When we started, we imported every single git commit, looked for anything that resembled an email address, and created tables for each commit, each user we found, and a log that connects users to commits. This did find false positives: sometimes email Message-ID’s showed up, and there were often mentions of folks just as a passing side-comment, or because they were in the context diff. (That script is also in the tools repository, of course.)

The user table also has columns to record approval or rejection of the license change, and comments. Most of the comments have been supportive, and (a bit surprisingly) only one or two were obscene.

The whattoremove script finds the users who refused, and all commits they were named in. We then examined each commit – there were 86 in all – and filtered out those that were cherry-picks to other releases. We are planning on only changing the license for the master branch, so the other releases branches aren’t applicable. There were also some obvious false positives. At the end of this step, we had 43 commits to review.

We then re-read every single commit, looking for what we could safely ignore as not relevant. We found the following:

  • Nine of them showed up only because the person was mentioned in a comment.
  • Sixteen of them were changes to the OpenBSD configuration data. The OpenSSL team had completely rewritten this, refactoring all BSD configurations into a common hierarchy, and the config stuff changed a great deal for 1.1.0.
  • Seven were not copyrightable as because they were minimal changes (e.g., fixing a typo in a comment).
  • One was a false positive.

This left us with 10 commits. Two of them them were about the CryptoDev engine. We are removing that engine, as can be seen in this PR, but we expect to have a replacement soon (for at least Linux and FreeBSD). As for the other commits, we are removing that code, as can be seen in this first PR. and this second PR. Fortunately, none of them are particularly complex.

Copyright, however, is a complex thing – at times it makes debugging look simple. If there is only one way to write something, then that content isn’t copyrightable. If the work is very small, such as a one-line patch, then it isn’t copyrightable. We aren’t lawyers, and have no wish to become such, but those viewpoints are valid. It is not official legal advice, however.

For the next step, we hope that we will be able to “put back” the code that we removed. We are looking to the community for help, and encourage you to create the appropriate GitHub pull requests. We ask the community to look at that second PR and provide fixes.

And finally, we ask everyone to look at the list of authors and see if their name, or the name of anyone they know, is listed. If so please email us.

June 17, 2017 12:00 PM

June 16, 2017

Everything Sysadmin

Stackoverflow... but for questions about how to teach CS

My high school had only one computer science teacher. When she had a problem or question, she had no where to turn. I can't imagine how isolating and stressful that must have been.

cseducators.stackexchange.com is a Stack Exchange website for computer science educators to ask questions and share successful teaching techniques. It was in private beta until this week. Now everyone can access. If it

Are you an educator looking for advice about how to integrate Git into an introductory CS class? Maybe you need to find a better analogy to help a student that doesn't understand 0-indexing? How do you convince students that well-indented code is worth the effort? What's a good emergency lesson plan you can substitute if the computers are down?

These are the kind of questions that computer science educators have and they're getting help right now at cseducators.stackexchange.com! If you are an educator, check it out. If you aren't, please consider helping answer their questions.

Most CS education goes towards hardware and software purchases, leaving educators very little support to help them be better teachers. I hope that CSEducators helps to fill that gap.

I hope the traffic to this site is enough that the site graduates from Beta to become a real site. This could be very powerful.

P.S. If you would like to discuss the site itself, there is a separate site for that called cseducators.meta.stackexchange.com.

by Tom Limoncelli at June 16, 2017 05:00 PM

June 15, 2017

Everything Sysadmin

NYC DevOps meeting next week: Measuring real-world DNS performance at Stack Overflow

Next week's NYCDevOps Meetup speaker is my co-worker, Mark Henderson, on the topic of "Measuring real-world DNS performance at Stack Overflow".

An in-depth look at how Stack Overflow records real-world DNS performance, and how you can do it too. You'll learn how we measured DNS performance when picking a DNS vendor, deciding whether or not to set up dual-DNS providers, and more.

The meeting is Tuesday, June 20, 2017, 7:00 PM at the Stack Overflow HQ in New York City.

For complete information and to RSVP, visit http://meetu.ps/39FgcP. Space is limited. RSVP soon!

by Tom Limoncelli at June 15, 2017 07:00 PM

Errata Security

Notes on open-sourcing abandoned code

Some people want a law that compels companies to release their source code for "abandoned software", in the name of cybersecurity, so that customers who bought it can continue to patch bugs long after the seller has stopped supporting the product. This is a bad policy, for a number of reasons.


Code is Speech

First of all, code is speech. That was the argument why Phil Zimmerman could print the source code to PGP in a book, ship it overseas, and then have somebody scan the code back into a computer. Compelled speech is a violation of free speech. That was one of the arguments in the Apple vs. FBI case, where the FBI demanded that Apple write code for them, compelling speech.

Compelling the opening of previously closed source is compelled speech.

There might still be legal arguments that get away with it. After all state already compels some speech, such as warning labels, where is services a narrow, legitimate government interest. So the courts may allow it. Also, like many free-speech issues (e.g. the legality of hate-speech), people may legitimately disagree with the courts about what "is" legal and what "should" be legal.

But here's the thing. What rights "should" be protected changes depending on what side you are on. Whether something deserves the protection of "free speech" depends upon whether the speaker is "us" or the speaker is "them". If it's "them", then you'll find all sorts of reasons why their speech is a special case, and what it doesn't deserve protection.

That's what's happening here. The legitimate government purpose of "product safety" looms large, the "code is speech" doesn't, because they hate closed-source code, and hate Microsoft in particular. The open-source community has been strong on "code is speech" when it applies to them, but weak when it applies to closed-source.

Define abandoned

What, precisely, does 'abandoned' mean? Consider Windows 3.1. Microsoft hasn't sold it for decades. Yet, it's not precisely abandoned either, because they still sell modern versions of Windows. Being forced to show even 30 year old source code would give competitors a significant advantage in creating Windows-compatible code like WINE.

When code is truly abandoned, such as when the vendor has gone out of business, chances are good they don't have the original source code anyway. Thus, in order for this policy to have any effect, you'd have to force vendors to give a third-party escrow service a copy of their code whenever they release a new version of their product.

All the source code

And that is surprisingly hard and costly. Most companies do not precisely know what source code their products are based upon. Yes, technically, all the code is in that ZIP file they gave to the escrow service, but it doesn't build. Essential build steps are missing, so that source code won't compile. It's like the dependency hell that many open-source products experience, such as downloading and installing two different versions of Python at different times during the build. Except, it's a hundred times worse.

Often times building closed-source requires itself an obscure version of a closed-source tool that itself has been abandoned by its original vendor. You often times can't even define which is the source code. For example, engine control units (ECUs) are Matlab code that compiles down to C, which is then integrated with other C code, all of which is (using a special compiler) is translated to C. Unless you have all these closed source products, some of which are no longer sold, the source-code to the ECU will not help you in patch bugs.

For small startups running fast, such as off Kickstarter, forcing them to escrow code that actually builds would force upon them an undue burden, harming innovation.

Binary patch and reversing

Then there is the issue of why you need the source code in the first place. Here's the deal with binary exploits like buffer-overflows: if you know enough to exploit it, you know enough to patch it. Just add some binary code onto the end of the function the program that verifies the input, then replace where the vulnerability happens to a jump instruction to the new code.

I know this is possible and fairly trivial because I've done it myself. Indeed, one of the reason Microsoft has signed kernel components is specifically because they got tired of me patching the live kernel this way (and, almost sued me for reverse engineering their code in violation of their EULA).

Given the aforementioned difficulties in building software, this would be the easier option for third parties trying to fix bugs. The only reason closed-source companies don't do this already is because they need to fix their products permanently anyway, which involves checking in the change into their source control systems and rebuilding.

Conclusion

So what we see here is that there is no compelling benefit to forcing vendors to release code for "abandoned" products, while at the same time, there are significant costs involved, not the least of which is a violation of the principle that "code is speech".

It doesn't exist as a serious proposal. It only exists as a way to support open-source advocacy and security advocacy. Both would gladly stomp on your rights and drive up costs in order to achieve their higher moral goal.





Bonus: so let's say you decide that "Window XP" has been abandoned, which is exactly the intent of proponents. You think what would happen is that we (the open-source community) would then be able to continue to support WinXP and patch bugs.

But what we'd see instead is a lot more copies of WinXP floating around, with vulnerabilities, as people decided to use it instead of paying hundreds of dollars for a new Windows 10 license.

Indeed, part of the reason for Micrsoft abandoning WinXP is because it's riddled with flaws that can't practically be fixed, whereas the new features of Win10 fundamentally fixes them. Getting rid of SMBv1 is just one of many examples.

by Robert Graham (noreply@blogger.com) at June 15, 2017 09:43 AM

June 14, 2017

Steve Kemp's Blog

Porting pfctl to Linux

If you have a bunch of machines running OpenBSD for firewalling purposes, which is pretty standard, you might start to use source-control to maintain the rulesets. You might go further, and use some kind of integration testing to deploy changes from your revision control system into production.

Of course before you deploy any pf.conf file you need to test that the file contents are valid/correct. If your integration system doesn't run on OpenBSD though you have a couple of choices:

  • Run a test-job that SSH's to the live systems, and tests syntax.
    • Via pfctl -n -f /path/to/rules/pf.conf.
  • Write a tool on your Linux hosts to parse and validate the rules.

I looked at this last year and got pretty far, but then got distracted. So the other day I picked it up again. It turns out that if you're patient it's not hard to use bison to generate some C code, then glue it together such that you can validate your firewall rules on a Linux system.

  deagol ~/pf.ctl $ ./pfctl ./pf.conf
  ./pf.conf:298: macro 'undefined_variable' not defined
  ./pf.conf:298: syntax error

Unfortunately I had to remove quite a lot of code to get the tool to compile, which means that while some failures like that above are caught others are missed. The example above reads:

vlans="{vlan1,vlan2}"
..
pass out on $vlans proto udp from $undefined_variable

Unfortunately the following line does not raise an error:

pass out on vlan12 inet proto tcp from <unknown> to $http_server port {80,443}

That comes about because looking up the value of the table named unknown just silently fails. In slowly removing more and more code to make it compile I lost the ability to keep track of table definitions - both their names and their values - Thus the fetching of a table by name has become a NOP, and a bogus name will result in no error.

Now it is possible, with more care, that you could use a hashtable library, or similar, to simulate these things. But I kinda stalled, again.

(Similar things happen with fetching a proto by name, I just hardcoded inet, gre, icmp, icmp6, etc. Things that I'd actually use.)

Might be a fun project for somebody with some time anyway! Download the OpenBSD source, e.g. from a github mirror - yeah, yeah, but still. CVS? No thanks! - Then poke around beneath sbin/pfctl/. The main file you'll want to grab is parse.y, although you'll need to setup a bunch of headers too, and write yourself a Makefile. Here's a hint:

  deagol ~/pf.ctl $ tree
  .
  ├── inc
  │   ├── net
  │   │   └── pfvar.h
  │   ├── queue.h
  │   └── sys
  │       ├── _null.h
  │       ├── refcnt.h
  │       └── tree.h
  ├── Makefile
  ├── parse.y
  ├── pf.conf
  ├── pfctl.h
  ├── pfctl_parser.h
  └── y.tab.c

  3 directories, 11 files

June 14, 2017 09:00 PM

Colin Percival

Oil changes, safety recalls, and software patches

Every few months I get an email from my local mechanic reminding me that it's time to get my car's oil changed. I generally ignore these emails; it costs time and money to get this done (I'm sure I could do it myself, but the time it would cost is worth more than the money it would save) and I drive little enough — about 2000 km/year — that I'm not too worried about the consequences of going for a bit longer than nominally advised between oil changes. I do get oil changes done... but typically once every 8-12 months, rather than the recommended 4-6 months. From what I've seen, I don't think I'm alone in taking a somewhat lackadaisical approach to routine oil changes.

June 14, 2017 04:40 AM

June 13, 2017

Evaggelos Balaskas

Failures will occur, even with ansible and version control systems!

Failures

Every SysAdmin, DevOp, SRE, Computer Engineer or even Developer knows that failures WILL occur. So you need to plan with that constant in mind. Failure causes can be present in hardware, power, operating system, networking, memory or even bugs in software. We often call them system failures but it is possible that a Human can be also the cause of such failure!

Listening to the stories on the latest episode of stack overflow podcast felt compelled to share my own oh-shit moment in recent history.

I am writing this article so others can learn from this story, as I did in the process.

Rolling upgrades

I am a really big fun of rolling upgrades.

I am used to work with server farms. In a nutshell that means a lot of servers connected to their own switch behind routers/load balancers. This architecture gives me a great opportunity when it is time to perform operations, like scheduling service updates in working hours.

eg. Update software version 1.2.3 to 1.2.4 on serverfarm001

The procedure is really easy:

  • From the load balancers, stop any new incoming traffic to one of the servers.
  • Monitor all processes on the server and wait for them to terminate.
  • When all connections hit zero, stop the service you want to upgrade.
  • Perform the service update
  • Testing
  • Monitor logs and possible alarms
  • Be ready to rollback if necessary
  • Send some low traffic and try to test service with internal users
  • When everything is OK, tell the load balancers to send more traffic
  • Wait, monitor everything, test, be sure
  • Revert changes on the load balancers so that the specific server can take equal traffic/connection as the others.

serverfarm.jpg

This procedure is well established in such environments, and gives us the benefit of working with the whole team in working hours without the need of scheduling a maintenance window in the middle of the night, when low customer traffic is reaching us. During the day, if something is not going as planned, we can also reach other departments and work with them, figuring out what is happening.

Configuration Management

We are using ansible as the main configuration management tool. Every file, playbook, role, task of ansible is under a version control system, so that we can review changes before applying them to production. Viewing diffs from a modern web tool can be a lifesaver in these days.

Virtualization

We also use docker images or virtual images as development machines, so that we can perform any new configuration, update/upgrade on those machines and test it there.

Ansible Inventory

To perform service updates with ansible on servers, we are using the ansible inventory to host some metadata (aka variables) for every machine in a serverfarm. Let me give you an example:

[serverfarm001]
server01 version=1.2.3
server02 version=1.2.3
server03 version=1.2.3
server04 version=1.2.4

And performing the update action via ansible limits

eg.

~> ansible-playbook serverfarm001.yml -t update -C -D -l server04


Rollback

When something is not going as planned, we revert the changes on ansible (version control) and re-push the previous changes on a system. Remember the system is not getting any traffic from the front-end routers.


The Update

I was ready to do the update. Nagios was opened, logs were tailed -f

and then:

~> ansible-playbook serverfarm001.yml -t update

The Mistake

I run the ansible-playbook without limiting the server I wanted to run the update !!!

So all new changes passed through all servers, at once!

On top of that, new configuration broke running software with previous version. When the restart notify of service occurred every server simple stopped!!!

Funny thing, the updated machine server04 worked perfectly, but no traffic was reaching through the load balancers to this server.


Activate Rollback

It was time to run the rollback procedure.

Reverting changes from version control is easy. Took me like a moment or something.
Running again:

~> ansible-playbook serverfarm001.yml

and …


Waiting for Nagios

In 2,5 minutes I had fixed the error and I was waiting for nagios to be green again.

Then … Nothing! Red alerts everywhere!


Oh-Shit Moment

It was time for me to inform everyone what I have done.
Explaining to my colleagues and manager the mistake and trying to figuring out what went wrong with the rollback procedure.


Collaboration

On this crucial moment everything else worked like clockwise.

My colleagues took every action to:

  • informing helpdesk
  • looking for errors
  • tailing logs
  • monitor graphs
  • viewing nagios
  • talking to other people
  • do the leg-work in general

and leaving me in piece with calm to figure out what went wrong.

I felt so proud to be part of the team at that specific moment.

If any of you reading this article: Truly thank all guys and gals .


Work-Around

I bypass ansible and copied the correct configuration to all servers via ssh.
My colleagues were telling me the good news and I was going through one by one of ~xx servers.
In 20minutes everything was back in normal.
And finally nagios was green again.


Blameless Post-Mortem

It was time for post-mortem and of course drafting the company’s incident report.

We already knew what happened and how, but nevertheless we need to write everything down and try to keep a good timeline of all steps.
This is not only for reporting but also for us. We need to figure out what happened exactly, do we need more monitoring tools?
Can we place any failsafes in our procedures? Also why the rollback procedure didnt work.


Fixing Rollback

I am writing this paragraph first, but to be honest with you, it took me some time getting to the bottom of this!

Rollback procedure actually is working as planned. I did a mistake with the version control system.

What we have done is to wrap ansible under another script so that we can select the version control revision number at runtime.
This is actually pretty neat, cause it gives us the ability to run ansible with previous versions of our configuration, without reverting in master branch.

The ansible wrapper asks for revision and by default we run it with [tip].

So the correct way to do rollbacks is:

eg.

~> ansible-playbook serverfarm001.yml -rev 238

At the time of problem, I didnt do that. I thought it was better to revert all changes and re-run ansible.
But ansible was running into default mode with tip revision !!

Although I manage pretty well on panic mode, that day my brain was frozen!


Re-Design Ansible

I wrap my head around and tried to find a better solution on performing service updates. I needed to change something that can run without the need of limit in ansible.

The answer has obvious in less than five minutes later:

files/serverfarm001/1.2.3
files/serverfarm001/1.2.4

I need to keep a separated configuration folder and run my ansible playbooks with variable instead of absolute paths.

eg.


- copy: src=files/serverfarm001/{{version}} dest=/etc/service/configuration

That will suffice next time (and actually did!). When the service upgrade is finished, We can simple remove the previous configuration folder without changing anything else in ansible.


Ansible Groups

Another (more simplistic) approach is to create a new group in ansible inventory.
Like you do with your staging Vs production environment.

eg.

[serverfarm001]
server01 version=1.2.3
server02 version=1.2.3
server03 version=1.2.3

[serverfarm001_new]
server04 version=1.2.4

and create a new yml file

---
- hosts: serverfarm001_new

run the ansible-playbook against the new serverfarm001_new group .


Validation

A lot of services nowadays have syntax check commands for their configuration.

You can use this validation process in ansible!

here is an example from ansible docs:

# Update sshd configuration safely, avoid locking yourself out
- template:
    src: etc/ssh/sshd_config.j2
    dest: /etc/ssh/sshd_config
    owner: root
    group: root
    mode: '0600'
    validate: /usr/sbin/sshd -t -f %s
    backup: yes

or you can use registers like this:

  - name: Check named
    shell: /usr/sbin/named-checkconf -t /var/named/chroot
    register: named_checkconf
    changed_when: "named_checkconf.rc == 0"
    notify: anycast rndc reconfig


Conclusion

Everyone makes mistakes. I know, I have some oh-shit moments in my career for sure. Try to learn from these failures and educate others. Keep notes and write everything down in a wiki or in whatever documentation tool you are using internally. Always keep your calm. Do not hide any errors from your team or your manager. Be the first person that informs everyone. If the working environment doesnt make you feel safe, making mistakes, perhaps you should think changing scenery. You will make a mistake, failures will occur. It is a well known fact and you have to be ready when the time is up. Do a blameless post-mortem. The only way a team can be better is via responsibility, not blame. You need to perform disaster-recovery scenarios from time to time and test your backup. And always -ALWAYS- use a proper configuration management tool for all changes on your infrastructure.


post scriptum

After writing this draft, I had a talk with some friends regarding the cloud industry and how this experience can be applied into such environment. The quick answer is you SHOULD NOT.

Working with cloud, means you are mostly using virtualization. Docker images or even Virtual Machines should be ephemeral. When it’s time to perform upgrades (system patching or software upgrades) you should be creating new virtual machines that will replace the old ones. There is no need to do it in any other way. You can rolling replacing the virtual machines (or docker images) without the need of stopping the service in a machine, do the upgrade, testing, put it back. Those ephemeral machines should not have any data or logs in the first place. Cloud means that you can (auto) scale as needed it without thinking where the data are.

thanks for reading.

Tag(s): failures, ansible

June 13, 2017 07:57 PM

OpenSSL

New Committers

We announced back in October that we would be changing from a single OpenSSL Project Team to having an OpenSSL management committee and a set of committers which we planned to expand to enable the greater involvement from the community.

Now that we have in place committer guidelines, we have invited the first set of external (non-OMC) community members to become committers and they have each accepted the invitation.

What this means for the OpenSSL community is that there is now a larger group of active developers who have the ability to review and commit code to our source code repository. These new committers will help the existing committers with our review process (i.e., their reviews count towards approval) which we anticipate will help us keep on top of the github issues and pull request queues.

The first group of non-OMC committers (in alphabetical order) are:

As always, we welcome comments on submissions and technical reviews of pull requests from anyone in the community.

Note: All submissions must be reviewed and approved by at least two committers, one of whom must also be an OMC member as described in the project bylaws

June 13, 2017 12:00 PM

pagetable

Building the Original Commodore 64 KERNAL Source

Many reverse-engineered versions of “KERNAL”, the C64′s ROM operating system exist, and some of them even in a form that can be built into the original binary. But how about building the original C64 KERNAL source with the original tools?

The Commodore engineer Dennis Jarvis rescued many disks with Commodore source, which he gave to Steve Gray for preservation. One of these disks, c64kernal.d64, contains the complete source of the original KERNAL version (901227-01), including all build tools. Let’s build it!

Building KERNAL

First, we need a Commodore PET. Any PET will do, as long as it has enough RAM. 32 KB should be fine. For a disk drive, a 2040 or 4040 is recommended if you want to create the cross-reference database (more RAM needed for many open files), otherwise a 2031 will do just fine. The VICE emulator will allow you to configure such a system.

First you should delete the existing cross-reference files on disk. With the c1541 command line tool that comes with VICE, this can be done like this:

c1541 c64kernal.d64 -delete xr* -validate

Now attach the disk image and either have the emulator autostart it, or

LOAD"ASX4.0",8
RUN

This will run the “PET RESIDENT ASSEMBLER”. Now answer the questions like this:

PET RESIDENT ASSEMBLER V102780
(C) 1980 BY COMMODORE BUSINESS MACHINES

OBJECT FILE (CR OR D:NAME): KERNAL.HEX
HARD COPY (CR/Y OR N)? N
CROSS REFERENCE (CR/NO OR Y)? N
SOURCE FILE NAME? KERNAL

The assembler will now do its work and create the file “KERNAL.HEX” on disk. This will take about 16 minutes, so you probably want to switch your emulator into “Warp Mode”. When it is done, quit the emulator to make sure the contents of the disk image are flushed.

Using c1541, we can extract the output of the assembler:

c1541 c64kernal.d64 -read kernal.hex

The file is in MOS Technology Hex format and can be converted using srecord:

tr '\r' '\n' < kernal.hex > kernal_lf.hex
srec_cat kernal_lf.hex -MOS_Technologies \
-offset -0xe000 \
-fill 0xaa 0x0000 0x1fff \
-o kernal.bin -Binary

This command makes sure to fill the gaps with a value of 0xAA like the original KERNAL ROM.

The resulting file is identical with 901227-01 except for the following:

  • $E000-$E4AB is empty instead of containing the overflow BASIC code
  • $E4AC does not contain the checksum (the BASIC program “chksum” on the disk is used to calculate it)
  • $FFF6 does not contain the “RRBY” signature

Also note that $FF80, the KERNAL version byte is $AA – the first version of KERNAL did not yet use this location for the version byte, so the effective version byte is the fill byte.

Browsing the Source

An ASCII/LF version of the source code is available at https://github.com/mist64/cbmsrc.

;****************************************
;*                                      *
;* KK  K EEEEE RRRR  NN  N  AAA  LL     *
;* KK KK EE    RR  R NNN N AA  A LL     *
;* KKK   EE    RR  R NNN N AA  A LL     *
;* KKK   EEEE  RRRR  NNNNN AAAAA LL     *
;* KK K  EE    RR  R NN NN AA  A LL     *
;* KK KK EE    RR  R NN NN AA  A LL     *
;* KK KK EEEEE RR  R NN NN AA  A LLLLL  *
;*                                      *
;***************************************
;
;***************************************
;* PET KERNAL                          *
;*   MEMORY AND I/O DEPENDENT ROUTINES *
;* DRIVING THE HARDWARE OF THE         *
;* FOLLOWING CBM MODELS:               *
;*   COMMODORE 64 OR MODIFED VIC-40    *
;* COPYRIGHT (C) 1982 BY               *
;* COMMODORE BUSINESS MACHINES (CBM)   *
;***************************************
.SKI 3
;****LISTING DATE --1200 14 MAY 1982****
.SKI 3
;***************************************
;* THIS SOFTWARE IS FURNISHED FOR USE  *
;* USE IN THE VIC OR COMMODORE COMPUTER*
;* SERIES ONLY.                        *
;*                                     *
;* COPIES THEREOF MAY NOT BE PROVIDED  *
;* OR MADE AVAILABLE FOR USE ON ANY    *
;* OTHER SYSTEM.                       *
;*                                     *
;* THE INFORMATION IN THIS DOCUMENT IS *
;* SUBJECT TO CHANGE WITHOUT NOTICE.   *
;*                                     *
;* NO RESPONSIBILITY IS ASSUMED FOR    *
;* RELIABILITY OF THIS SOFTWARE. RSR   *
;*                                     *
;***************************************
.END

by Michael Steil at June 13, 2017 07:44 AM

June 01, 2017

Steve Kemp's Blog

So I accidentally wrote a linux security module

Tonight I read this weeks LWN quotes-page a little later than usual because I was busy at work for most of the day. Anyway as always LWNs content was awesome, and this particular list lead to an interesting discussion about a new Linux-Security-Module (LSM).

That read weirdly, what I was trying to say was that every Thursday morning I like to read LWN at work. Tonight is the first chance I had to get round to it.

One of the later replies in the thread was particularly interesting as it said:

Suggestion:

Create an security module that looks for the attribute

    security.WHITELISTED

on things being executed/mmapped and denys it if the attribute
isn't present. Create a program (whitelistd) that reads
/etc/whitelist.conf and scans the system to ensure that only
things on the list have the attribute.

So I figured that was a simple idea, and it didn't seem too hard even for myself as a non-kernel non-developer. There are several linux security modules included in the kernel-releases, beneath the top-level security/ directory, so I assumed I could copy & paste code around them to get something working.

During the course of all this work, which took about 90 minutes from start to Finnish (that pun never gets old), this online documentation was enormously useful:

Brief attr primer

If you're not familiar with the attr tool it's pretty simple. You can assign values to arbitrary labels on files. The only annoying thing is you have to use extra-flags to commands like rsync, tar, cp, etc, to preserve the damn things.

Set three attributes on the file named moi:

$ touch moi
$ attr -s forename -V "Steve"      moi
$ attr -s surname  -V "Kemp"       moi
$ attr -s name     -V "Steve Kemp" moi

Now list the attributes present:

$ attr -l moi
Attribute "name" has a 10 byte value for moi
Attribute "forename" has a 5 byte value for moi
Attribute "surname" has a 4 byte value for moi

And retrieve one?

$ attr -q -g name moi
Steve Kemp

LSM Skeleton

My initial starting point was to create "steve_lsm.c", with the following contents:

 #include <linux/lsm_hooks.h>

 /*
  * Log things for the moment.
  */
 static int steve_bprm_check_security(struct linux_binprm *bprm)
 {
     printk(KERN_INFO "STEVE LSM check of %s\n", bprm->filename);
     return 0;
 }

 /*
  * Only check exec().
  */
 static struct security_hook_list steve_hooks[] = {
     LSM_HOOK_INIT(bprm_check_security, steve_bprm_check_security),
 };

 /*
  * Somebody set us up the bomb.
  */
 static void __init steve_init(void)
 {
     security_add_hooks(steve_hooks, ARRAY_SIZE(steve_hooks), "steve");
     printk(KERN_INFO "STEVE LSM initialized\n");
 }

With that in place I had to modify the various KBuild files beneath security/ to make sure this could be selected as an LSM, and add in a Makefile to the new directory security/steve/.

With the boiler-plate done though, and the host machine rebooted into my new kernel it was simple to test things out.

Obviously the first step, post-boot, is to make sure that the module is active, which can be done in two ways, looking at the output of dmesg, and explicitly listing the modules available:

 ~# dmesg | grep STEVE | head -n2
 STEVE LSM initialized
 STEVE LSM check of /init

 $ echo $(cat /sys/kernel/security/lsm)
 capability,steve

Making the LSM functional

The next step was to make the module do more than mere logging. In short this is what we want:

  • If a binary is invoked by root - allow it.
    • Although note that this might leave a hole, if the user can enter a new namespace where their UID is 0..
  • If a binary is invoked by a non-root user look for an extended attribute on the target-file named security.WHITELISTED.
    • If this is present we allow the execution.
    • If this is missing we deny the execution.

NOTE we don't care what the content of the extended attribute is, we just care whether it exists or not.

Reading the extended attribute is pretty simple, using the __vfs_getxattr function. All in all our module becomes this:

  #include <linux/xattr.h>
  #include <linux/binfmts.h>
  #include <linux/lsm_hooks.h>
  #include <linux/sysctl.h>
  #include <linux/ptrace.h>
  #include <linux/prctl.h>
  #include <linux/ratelimit.h>
  #include <linux/workqueue.h>
  #include <linux/string_helpers.h>
  #include <linux/task_work.h>
  #include <linux/sched.h>
  #include <linux/spinlock.h>
  #include <linux/lsm_hooks.h>


  /*
   * Perform a check of a program execution/map.
   *
   * Return 0 if it should be allowed, -EPERM on block.
   */
  static int steve_bprm_check_security(struct linux_binprm *bprm)
  {
         // The current task & the UID it is running as.
         const struct task_struct *task = current;
         kuid_t uid = task->cred->uid;

         // The target we're checking
         struct dentry *dentry = bprm->file->f_path.dentry;
         struct inode *inode = d_backing_inode(dentry);

         // The size of the label-value (if any).
         int size = 0;

         // Root can access everything.
         if ( uid.val == 0 )
            return 0;

         size = __vfs_getxattr(dentry, inode, "user.whitelisted", NULL, 0);
         if ( size >= 0 )
         {
             printk(KERN_INFO "STEVE LSM check of %s resulted in %d bytes from 'user.whitelisted' - permitting access for UID %d\n", bprm->filename, size, uid.val );
             return 0;
         }

         printk(KERN_INFO "STEVE LSM check of %s denying access for UID %d [ERRO:%d] \n", bprm->filename, uid.val, size );
         return -EPERM;
  }

  /*
   * The hooks we wish to be installed.
   */
  static struct security_hook_list steve_hooks[] = {
       LSM_HOOK_INIT(bprm_check_security, steve_bprm_check_security),
  };

  /*
   * Initialize our module.
   */
  void __init steve_add_hooks(void)
  {
       /* register ourselves with the security framework */
       security_add_hooks(steve_hooks, ARRAY_SIZE(steve_hooks), "steve");

       printk(KERN_INFO "STEVE LSM initialized\n");
  }

Once again we reboot with this new kernel, and we test that the LSM is active. After the basic testing, as before, we can now test real functionality. By default no binaries will have the attribute we look for present - so we'd expect ALL commands to fail, unless executed by root. Let us test that:

~# su - nobody -s /bin/sh
No directory, logging in with HOME=/
Cannot execute /bin/sh: Operation not permitted

That looks like it worked. Let us allow users to run /bin/sh:

 ~# attr -s whitelisted -V 1 /bin/sh

Unfortunately that fails, because symlinks are weird, but repeating the test with /bin/dash works as expected:

 ~# su - nobody -s /bin/dash
 No directory, logging in with HOME=/
 Cannot execute /bin/dash: Operation not permitted

 ~# attr -s whitelisted -V 1 /bin/dash
 ~# attr -s whitelisted -V 1 /usr/bin/id

 ~# su - nobody -s /bin/dash
 No directory, logging in with HOME=/
 $ id
 uid=65534(nobody) gid=65534(nogroup) groups=65534(nogroup)

 $ uptime
 -su: 2: uptime: Operation not permitted

And our logging shows the useful results as we'd expect:

  STEVE LSM check of /usr/bin/id resulted in 1 bytes from 'user.WHITELISTED' - permitting access for UID 65534
  STEVE LSM check of /usr/bin/uptime denying access for UID 65534 [ERRO:-95]

Surprises

If you were paying careful attention you'll see that we changed what we did part-way through this guide.

  • The initial suggestion said to look for security.WHITELISTED.
  • But in the kernel module I look for user.whitelisted.
    • And when setting the attribute I only set whitelisted.

Not sure what is going on there, but it was very confusing. It appears to be the case that when you set an attribute a secret user. prefix is added to the name.

Could be worth some research by somebody with more time on their hands than I have.

Anyway I don't expect this is a terribly useful module, but it was my first, and I think it should be pretty stable. Feedback on my code certainly welcome!

June 01, 2017 09:00 PM

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – May 2017

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts/topics this
month:
  1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on developing security monitoring use cases here!
  2. Simple Log Review Checklist Released!” is often at the top of this list – this aging checklist is still a very useful tool for many people. “On Free Log Management Tools” (also aged a bit by now) is a companion to the checklist (updated version)
  3. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009. Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 
  4. This month, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book (now in its 4th edition!) – note that this series is mentioned in some PCI Council materials. 
  5. “SIEM Resourcing or How Much the Friggin’ Thing Would REALLY Cost Me?” is a quick framework for assessing the SIEM project (well, a program, really) costs at an organization (a lot more details on this here in this paper).
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has more than 5X of the traffic of this blog]: 

EPIC WIN post:
Current research on vulnerability management:
Current research on cloud security monitoring:
Recent research on security analytics and UBA / UEBA:
Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016.

Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Previous post in this endless series:

by Anton Chuvakin (anton@chuvakin.org) at June 01, 2017 04:40 PM

May 31, 2017

ma.ttias.be

HTTP/2 push is tougher than I thought

The post HTTP/2 push is tougher than I thought appeared first on ma.ttias.be.

Nicely done post about the technical implementations & gotcha's related to HTTP/2.

HTTP/2 push is more complicated and low-level than I initially thought, but what really caught me off-guard is how inconsistent it is between browsers – I'd assumed it was a done deal & totally ready for production.

This isn't an "HTTP/2 push is a douchebag" hatchet job – I think HTTP/2 push is really powerful and will improve over time, but I no longer think it's a silver bullet from a golden gun.

Source: HTTP/2 push is tougher than I thought -- JakeArchibald.com

The post HTTP/2 push is tougher than I thought appeared first on ma.ttias.be.

by Mattias Geniar at May 31, 2017 09:11 AM

May 29, 2017

Steve Kemp's Blog

Security is hard ..

3D-Printing

I continued to be impressed with local vendors, found on 3dhubs. I've had several more things printed out, including an "internet button", and some card-holders for Settlers of Catan.

The most recent print I had made was a collection of display cases, for holding an OLED display, as well as an ESP8266 device.

Unfortunately at the same time as I was falling in love with the service I discovered a glaring XSS attack against the site itself.

Anybody who viewed my profile page could have arbitrary javascript executed, which in some cases would actually disclose their private details - such as:

  • Their forename & surname.
  • Their email-address.
  • Their telephone number.
  • Their GeoIP details.

Discovering this took minutes, writing it up took an hour, and a month later it is still unfixed.

I've deleted my account.

May 29, 2017 09:00 PM

May 25, 2017

Raymii.org

Openstack Horizon, remove the loading modal with uBlock Origin

The OpenStack dashboard, Horizon, is a great piece of software to manage your OpenStack resources via the web. However, it has, in my opinion, a very big usability issue. The loading dialog that appears after you click a link. It blocks the entire page and all other links. So, whenever I click, I have to wait three to five seconds before I can do anything else. Clicked the wrong menu item? Sucks to be you, here have some loading. Clicked a link and quickly want to open something in a new tab while the page is still loading? Nope, not today. It's not that browsers have had a function to show that a page is loading, no, of course, the loading indication that has been there forever is not good enough. Let's re-invent the wheel and significantly impact the user experience. With two rules in uBlock Origin this loading modal is removed and you can work normally again in Horizon

May 25, 2017 12:00 AM

May 24, 2017

ma.ttias.be

Samba CVE-2017-7494: Remote Code Execution in Samba 3.5.0 and upwards

The post Samba CVE-2017-7494: Remote Code Execution in Samba 3.5.0 and upwards appeared first on ma.ttias.be.

If you run Samba, get patching.

CVE-2017-7494: All versions of Samba from 3.5.0 onwards are vulnerable to a remote code execution vulnerability, allowing a malicious client to upload a shared library to a writable share, and then cause the server to load and execute it.

Source: [Announce] Samba 4.6.4, 4.5.10 and 4.4.14 Available for Download

The post Samba CVE-2017-7494: Remote Code Execution in Samba 3.5.0 and upwards appeared first on ma.ttias.be.

by Mattias Geniar at May 24, 2017 12:15 PM

May 22, 2017

HolisticInfoSec.org

Toolsmith #125: ZAPR - OWASP ZAP API R Interface

It is my sincere hope that when I say OWASP Zed Attack Proxy (ZAP), you say "Hell, yeah!" rather than "What's that?". This publication has been a longtime supporter, and so many brilliant contibutors and practitioners have lent to OWASP ZAPs growth, in addition to @psiinon's extraordinary project leadership. OWASP ZAP has been 1st or 2nd in the last four years of @ToolsWatch best tool survey's for a damned good reason. OWASP ZAP usage has been well documented and presented over the years, and the wiki gives you tons to consider as you explore OWASP ZAP user scenarios.
One of the more recent scenarios I've sought to explore recently is use of the OWASP ZAP API. The OWASP ZAP API is also well documented, more than enough detail to get you started, but consider a few use case scenarios.
First, there is a functional, clean OWASP ZAP API UI, that gives you a viewer's perspective as you contemplate programmatic opportunities. OWASP ZAP API interaction is URL based, and you can invoke both access views and invoke actions. Explore any component and you'll immediately find related views or actions. Drilling into to core via http://localhost:8067/UI/core/ (I run OWASP ZAP on 8067, your install will likely be different), gives me a ton to choose from.
You'll need your API key in order to build queries. You can find yours via Tools | Options | API | API Key. As an example, drill into numberOfAlerts (baseurl ), which gets the number of alerts, optionally filtering by URL. You'll then be presented with the query builder, where you can enter you key, define specific parameter, and decide your preferred output format including JSON, HTML, and XML.
Sure, you'll receive results in your browser, this query will provide answers in HTML tables, but these aren't necessarily optimal for programmatic data consumption and manipulation. That said, you learn the most important part of this lesson, a fully populated OWASP ZAP API GET URL: http://localhost:8067/HTML/core/view/numberOfAlerts/?zapapiformat=HTML&apikey=2v3tebdgojtcq3503kuoq2lq5g&formMethod=GET&baseurl=.
This request would return




in HTML. Very straightforward and easy to modify per your preferences, but HTML results aren't very machine friendly. Want JSON results instead? Just swap  out HTML with JSON in the URL, or just choose JSON in the builder. I'll tell you than I prefer working with JSON when I use the OWASP ZAP API via the likes of R. It's certainly the cleanest, machine-consumable option, though others may argue with me in favor of XML.
Allow me to provide you an example with which you can experiment, one I'll likely continue to develop against as it's kind of cool for active reporting on OWASP ZAP scans in flight or on results when session complete. Note, all my code, crappy as it may be, is available for you on GitHub. I mean to say, this is really v0.1 stuff, so contribute and debug as you see fit. It's also important to note that OWASP ZAP needs to be running, either with an active scanning session, or a stored session you saved earlier. I scanned my website, holisticinfosec.org, and saved the session for regular use as I wrote this. You can even see reference to the saved session by location below.
R users are likely aware of Shiny, a web application framework for R, and its dashboard capabilities. I also discovered that rCharts are designed to work interactively and beautifully within Shiny.
R includes packages that make parsing from JSON rather straightforward, as I learned from Zev Ross. RJSONIO makes it as easy as fromJSON("http://localhost:8067/JSON/core/view/alerts/?zapapiformat=JSON&apikey=2v3tebdgojtcq3503kuoq2lq5g&formMethod=GET&baseurl=&start=&count=")
to pull data from the OWASP ZAP API. We use the fromJSON "function and its methods to read content in JSON format and de-serializes it into R objects", where the ZAP API URL is that content.
I further parsed alert data using Zev's grabInfo function and organized the results into a data frame (ZapDataDF). I then further sorted the alert content from ZapDataDF into objects useful for reporting and visualization. Within each alert objects are values such as the risk level, the alert message, the CWE ID, the WASC ID, and the Plugin ID. Defining each of these values into parameter useful to R is completed with the likes of:
I then combined all those results into another data frame I called reportDF, the results of which are seen in the figure below.
reportDF results
Now we've got some content we can pivot on.
First, let's summarize the findings and present them in their resplendent glory via ZAPR: OWASP ZAP API R Interface.
Code first, truly simple stuff it is:
Summary overview API calls

You can see that we're simply using RJSONIO's fromJSON to make specific ZAP API call. The results are quite tidy, as seen below.
ZAPR Overview
One of my favorite features in Shiny is the renderDataTable function. When utilized in a Shiny dashboard, it makes filtering results a breeze, and thus is utilized as the first real feature in ZAPR. The code is tedious, review or play with it from GitHub, but the results should speak for themselves. I filtered the view by CWE ID 89, which in this case is a bit of a false positive, I have a very flat web site, no database, thank you very much. Nonetheless, good to have an example of what would definitely be a high risk finding.


Alert filtering

Alert filtering is nice, I'll add more results capabilities as I develop this further, but visualizations are important too. This is where rCharts really come to bear in Shiny as they are interactive. I've used the simplest examples, but you'll get the point. First, a few, wee lines of R as seen below.
Chart code
The results are much more satisfying to look at, and allow interactivity. Ramnath Vaidyanathan has done really nice work here. First, OWASP ZAP alerts pulled via the API are counted by risk in a bar chart.
Alert counts

As I moused over Medium, we can see that there were specifically 17 results from my OWASP ZAP scan of holisticinfosec.org.
Our second visualization are the CWE ID results by count, in an oft disdained but interactive pie chart (yes, I have some work to do on layout).


CWE IDs by count

As we learned earlier, I only had one CWE ID 89 hit during the session, and the visualization supports what we saw in the data table.
The possibilities are endless to pull data from the OWASP ZAP API and incorporate the results into any number of applications or report scenarios. I have a feeling there is a rich opportunity here with PowerBI, which I intend to explore. All the code is here, along with the OWASP ZAP session I refer to, so you can play with it for yourself. You'll need OWASP ZAP, R, and RStudio to make it all work together, let me know if you have questions or suggestions.
Cheers, until next time.

by Russ McRee (noreply@blogger.com) at May 22, 2017 06:35 AM

May 20, 2017

Sarah Allen

serverless patterns at google i/o 2017

At Google I/O last week, we presented how to build robust mobile applications for the distributed cloud about building mobile apps in this new world of “serverless development.” When people hear “serverless” sometimes they think we can write code on the server that is just like client-side code, but that’s not really the point.

We have many of the same concerns in developing the code that we always have when we write client-server sofware — we just don’t need to manage servers directly, which drastically reduces operational challenges. In this talk we dive into some specific software development patterns that help us write robust code that scales well.

Speakers:

  • Sarah Allen (me!) I lead the engineering team that works at the intersection of Firebase and Google Cloud on the server side.
  • Judy Tuan introduced me to Firebase over 5 years ago, when she led our team at AngelHack SF (on May 3, 2012) to build an iPhone app that would paint 3D shapes by waving your phone around using the accelerometer. That event happened to be Firebase’s first public launch, and she met Andrew Lee who convinced her to use Firebase in our app. She’s an independent software developer, and also working with Tech Workers Coalition.
  • Jen-Mei Wu is a software architect at Indiegogo, and also volunteers at Liberating Ourselves Locally, a people-of-color-led, gender-diverse, queer and trans inclusive hacker/maker space in East Oakland.

Jen-Mei kicked off the talk by introducing a use case for serverless development based on her work at the maker space, which often works to help non-profits. They are limited in their ability to deploy custom software because they work with small organizations who are staffed by non-technical folk. It doesn’t make sense to set them up with a need to devote resources to updating to the underlying software and operating systems needed to run a web application. With Firebase, the server software is automatically kept up to date with all the needed security patches for all of the required dependencies, it scales up when needed, and scales down to zero cost when not in active use.

The primary use case that motivated the talk from my perspective is for businesses that need to get started quickly, then scale up as usage grows. I found it inspiring to hear about how Firebase supports very small organizations with the same products and infrastructure that auto-scale to a global, distributed network supporting businesses with billions of users.

The concept for this talk was for some real-world developers (Judy and Jen-Mei) to envision an application that would illustrate common patterns in mobile application development, then I recruited a few Firebase engineers to join their open source team and we built the app together.

Severless Patterns

We identified some key needs that are common to mobile applications:

  • People use the app
  • The app stores data
  • The app show the data to people
  • There is some core business logic (“secret sauce”)
  • People interact with each other in some way

The talk details some of the core development patterns:

  • Server-side “special sauce”
  • Authentication & access control
  • Databinding to simplify UI development

The App: Hubbub

Ever go to a conference and feel like you haven’t connected to the people you would have liked to meet? This app seeks to solve that! Using your public GitHub data, we might be able to connect you with other people who share your technical interests. Maybe not, but at least you’ll have lunch with somebody.

You authenticate with GitHub, then the app creates a profile for you by pulling a lot of data from the GitHub API. Your profile might include languages you code in, a list of connections based on your pull request history or the other committers on projects that you have contributed to. Then there’s a list of events, which have a time and place, which people can sign up for. Before the event, people will be placed in groups with a topic they might be interested in. They show up at the appointed time and place, and then to find their assigned group, they open a beacon screen in the app which shows an image that is unique to their group (a pattern of one or more hubbubs with their topic name) and they find each other by holding up their phones.

We built most of the app, enough to really work through the key development patterns, but didn’t have time to hook up the profile generation data collection and implement a good matching algorithm. We ended up using a simple random grouping and were able to test the app at Google I/O for lunch on Friday. Some people showed up and it worked!

You can see the code at: https://github.com/all-the-hubbub


Many thanks to all the people who joined our open source team to develop the app:

and who helped with the presentation:

  • Virginia Poltrack, graphics
  • Yael Kazaz, public speaking, story-telling coach
  • Puf and Megan, rehearsal feedback
  • Todd Kerpelman our Google I/O mentor
  • Laura Nozay and all of the wonderful Google I/O staff who made the event possible

by sarah at May 20, 2017 05:01 PM

May 15, 2017

Sean's IT Blog

Integrating Duo MFA and Amazon WorkSpaces

I recently did some work integrating Duo MFA with Amazon WorkSpaces.  The integration work ran into a few challenges, and I wanted to blog about those challenges to help others in the future.

Understanding Amazon WorkSpaces

If you couldn’t tell by the name, Amazon WorkSpaces is a cloud-hosted Desktop-as-a-Service offering that runs on Amazon Web Services (AWS).  It utilizes AWS’s underlying infrastructure to deploy desktop workloads, either using licensing provided by Amazon or the customer.  The benefit of this over on-premises VDI solutions is that Amazon manages the management infrastructure.  Amazon also provides multiple methods to integrate with the customer’s Active Directory environment, so users can continue to use their existing credentials to access the deployed desktops.

WorkSpaces uses an implementation of the PCoIP protocol for accessing desktops from the client application, and the client application is available on Windows, Linux, MacOS, iOS, and Android.  Amazon also has a web-based client that allows users to access their desktop from a web browser.  This feature is not enabled by default.

Anyway, that’s a very high-level overview of WorkSpaces.

Understanding How Users Connect to WorkSpaces

Before we can talk about how to integrate any multi-factor authentication solution into WorkSpaces, let’s go through the connection path and how that impacts how MFA is used.  WorkSpaces differs from traditional on-premises VDI in that all user connections to the desktop go through a public-facing service.  This happens even if I have a VPN tunnel or use DirectConnect.

The following image illustrates the connection path between the users and the desktop when a VPN connection is in place between the on-premises environment and AWS:

ad_connector_architecture_vpn

Image courtesy of AWS, retrieved from http://docs.aws.amazon.com/workspaces/latest/adminguide/prep_connect.html

As you can see from the diagram, on-premises users don’t connect to the WorkSpaces over the VPN tunnel – they do it over the Internet.  They are coming into their desktop from an untrusted connection, even if the endpoint they are connecting from is on a trusted network.

Note: The page that I retrieved this diagram from shows users connecting to WorkSpaces over a DirectConnect link when one is present.  This diagram shows a DirectConnect link that has a public endpoint, and all WorkSpaces connections are routed through this public endpoint.  In this configuration, logins are treated the same as logins coming in over the Internet.

When multi-factor authentication is configured for WorkSpaces, it is required for all user logins.  There is no way to configure rules to apply MFA to users connecting from untrusted networks because WorkSpaces sees all connections coming in on a public, Internet-facing service.

WorkSpaces MFA Requirements

WorkSpaces only works with MFA providers that support RADIUS.  The RADIUS servers can be located on-premises or in AWS.  I recommend placing the RADIUS proxies in another VPC in AWS where other supporting infrastructure resources, like AD Domain Controllers, are located.  These servers can be placed in the WorkSpaces VPC, but I wouldn’t recommend it.

WorkSpaces can use multiple RADIUS servers,and it will load-balance requests across all configured RADIUS servers.  RADIUS can be configured on both the Active Directory Connectors and the managed Microsoft AD directory services options. 

Note: The Duo documentation states that RADIUS can only be configured on the Active Directory Connector. Testing shows that RADIUS works on both enterprise WorkSpaces Directory Services options.

Configuring Duo

Before we can enable MFA in WorkSpaces, Duo needs to be configured.  At least one Duo proxy needs to be reachable from the WorkSpaces VPC.  This could be on a Windows or Linux EC2 instance in the WorkSpaces VPC, but it will most likely be an EC2 instance in a shared services or management VPC or in an on-premises datacenter.  The steps for installing the Duo Proxies are beyond the scope of this article, but they can be found in the Duo documentation.

Before the Duo Authentication Proxies are configured, we need to configure a new application in the Duo Admin console.  This step generates an integration key and secret key that will be used by the proxy and configure how new users will be handled.  The steps for creating the new application in Duo are:

1. Log into your Duo Admin Console at https://admin.duosecurity.com

2. Login as  your user and select the two-factor authentication method for your account.  Once you’ve completed the two-factor sign-on, you will have access to your Duo Admin panel.

3. Click on Applications in the upper left-hand corner.

4. Click on Protect an Application

5. Search for RADIUS.

6. Click Protect this Application  underneath the RADIUS application.

image

7. Record the Integration Key, Secret Key, and API Hostname.  These will be used when configuring the Duo Proxy in a few steps.

image

8. Provide a Name for the Application.  Users will see this name in their mobile device if they use Duo Push.

image

9. Set the New User Policy to Deny Access.  Deny Access is required because the WorkSpaces setup process will fail if Duo is configured to fail open or auto-enroll.

image

10. Click Save Changes to save the new application.

Get the AWS Directory Services IP Addresses

The RADIUS Servers need to know which endpoints to accept RADIUS requests from.  When WorkSpaces is configured for MFA, these requests come from the Directory Services instance.  This can either be the Active Directory Connectors or the Managed Active Directory domain controllers.

The IP addresses that will be required can be found in the AWS Console under WorkSpaces –> Directories.  The field to use depends on the directory service type you’re using.  If you’re using Active Directory Connectors, the IP addresses are in the Directory IP Address field.  If you’re using the managed Microsoft AD service, the Directory IP address field has a value of “None,” so you will need to use the  IP addresses in the DNS Address field.

Configure AWS Security Group Rules for RADIUS

By default, the AWS security group for the Directory Services instance heavily restrict the inbound and outbound traffic.  In order to enable RADIUS, you’ll need to add inbound and outbound UDP 1812 to the destination destination IPs or subnet where the proxies are located.

The steps for updating the AWS security group rules are:

1. Log into the AWS Console.

2. Click on the Services menu and select WorkSpaces.

3. Click on Directories.

4. Copy the value in the Directory ID field of the directory that will have RADIUS configured.

5. Click on the Services menu and select VPC.

6. Click on Security Groups.

7. Paste the Directory ID into the Search field to find the Security Group attached to the Directory Services instance.

8. Select the Security Group.

9. Click on the Inbound Rules tab.

10. Click Edit.

11. Click Add Another Rule.

12. Select Custom UDP Rule in the Type dropdown box.

13. Enter 1812 in the port range field.

14. Enter the IP Address of the RADIUS Server in the Source field.

15. Repeat Steps 11 through 14 as necessary to create inbound rules for each Duo Proxy.

16. Click Save.

17. Click on the Outbound Rules tab.

18. Click Edit.

19. Click Add Another Rule.

20. Select Custom UDP Rule in the Type dropdown box.

21. Enter 1812 in the port range field.

22. Enter the IP Address of the RADIUS Server in the Source field.

23. Repeat Steps 11 through 14 as necessary to create inbound rules for each Duo Proxy.

24. Click Save.

Configure the Duo Authentication Proxies

Once the Duo side is configured and the AWS security rules are set up, we need to configure our Authentication Proxy or Proxies.  This needs to be done locally on each Authentication Proxy.  The steps for configuring the Authentication Proxy are:

Note:  This step assumes the authentication proxy is running on Windows.The steps for installing the authentication proxy are beyond the scope of this article.  You can find the directions for installing the proxy at: https://duo.com/docs/

1. Log into your Authentication Proxy.

2. Open the authproxy.cfg file located in C:\Program Files (x86)\Duo Security Authentication Proxy\conf.

Note: The authproxy.cfg file uses the Unix newline setup, and it may require a different text editor than Notepad.  If you can install it on your server, I recommend Notepad++ or Sublime Text. 

3. Add the following lines to your config file.  This requires the Integration Key, Secret Key, and API Hostname from Duo and the IP Addresses of the WorkSpaces Active Directory Connectors or Managed Active Directory domain controllers.  Both RADIUS servers need to use the same RADIUS shared secret, and it should be a complex string.

[duo_only_client]

[radius_server_duo_only]
ikey=<Integration Key>
skey=<Secret Key>
api_host=<API Hostname>.duosecurity.com
failmode=safe
radius_ip_1=Directory Service IP 1
radius_secret_1=Secret Key
radius_ip_2=Directory Service IP 2
radius_secret_2=Secret Key
port=1812

4. Save the configuration file.

5. Restart the Duo Authentication Proxy Service to apply the new configuration.

Integrating WorkSpaces with Duo

Now that our base infrastructure is configured, it’s time to set up Multi-Factor Authentication in WorkSpaces.  MFA is configured on the Directory Services instance.  If multiple directory services instances are required to meet different WorkSpaces use cases, then this configuration will need to be performed on each directory service instance. 

The steps for enabling WorkSpaces with Duo are:

1. Log into the AWS Console.

2. Click on the Services menu and select Directory Service.

3. Click on the Directory ID of the directory where MFA will be enabled.

4. Click the Multi-Factor authentication tab.

5. Click the Enable Multi-Factor Authentication checkbox.

6. Enter the IP address or addresses of your Duo Authentication Proxies in the RADIUS Server IP Address(es) field.

7. Enter the RADIUS Shared Secret in the Shared Secret Code and Confirm Shared Secret Code fields.

8. Change the Protocol to PAP.

9. Change the Server Timeout to 20 seconds.

10. Change the Max Retries to 3.

11. Click Update Directory.

12. If RADIUS setup is successful, the RADIUS Status should say Completed.

image

Using WorkSpaces with Duo

Once RADIUS is configured, the WorkSpaces client login prompt will change.  Users will be prompted for an MFA code along with their username and password.  This can be the one-time password generated in the mobile app, a push, or a text message with a one-time password.  If the SMS option is used, the user’s login will fail, but they will receive a text message with a one-time password. They will need to log in again using one of the codes received in the text message.

image

Troubleshooting

If the RADIUS setup does not complete successfully, there are a few things that you can check.  The first step is to verify that port 1812 is open on the Duo Authentication Proxies.  Also verify that the security group the directory services instance is configured to allow port 1812 inbound and outbound. 

One issue that I ran into in previous setups was the New User Policy in the Duo settings.  If it is not set to Deny Access, the process setup process will fail.  WorkSpaces is expecting a response code of Access Reject.  If it receives anything else, the process stalls out and fails about 15 minutes later. 

Finally, review the Duo Authentication Proxy logs.  These can be found in C:\Program Files (x86)\Duo Security Authentication Proxy\log on Authentication Proxies running Windows.  A tool like WinTail can be useful for watching the logs in real-time when trying to troubleshoot issues or verify that the RADIUS Services are functioning correctly.


by seanpmassey at May 15, 2017 06:08 PM

May 14, 2017

SysAdmin1138

Estimates and compounding margin of error

The two sides of this story are:

  • Software requirements are often... mutable over time.
  • Developer work-estimates are kind of iffy.

Here is a true story about the later. Many have similar tales to tell, but this one is mind. It's about metrics, agile, and large organizations. If you're expecting this to turn into a frAgile post, you're not wrong. But it's about failure-modes.

The setup

My employer at the time decided to go all-in on agile/scrum. It took several months, but every department in the company was moved to it. As they were an analytics company, their first reflex was to try to capture as much workflow data as possible so they can do magic data-analysis things on it. Which meant purchasing seats in an Agile SaaS product so everyone could track their work in it.

The details

By fiat from on-high, story points were effectively 'days'.

Due to the size of the development organization, there were three levels of Product Managers over the Individual Contributors I counted myself a part of.

The apex Product Manager, we had two, were for the two products we sold.

Marketing was also in Agile, as was Sales.

The killer feature

Because we were an analytics company, the CEO wanted a "single pane of glass" to give a snapshot of how close we were to achieving our product goals. Gathering metrics on all of our sprint-velocities, story/epic completion percentages, and story estimates, allowed us to give him that pane of glass. It had:

  • Progress bars for how close our products were to their next major milestones.
  • How many sprints it will take to get there.
  • How many sprints it would take to get to the milestone beyond it.

Awesome!

The failure

That pane of glass was a lying piece of shit.

The dashboard we had to build was based on so many fuzzy measurements that it was a thumb in the wind approximation for how fast we were going, and in what direction. The human bias to trust numbers derived using Science! is a strong one, and they were inappropriately trusted. Which lead to pressure from On High for highly accurate estimates, as the various Product Managers realized what was going on and attempted to compensate (remove uncertainty from one of the biggest sources of it).

Anyone who has built software knows that problems come in three types:

  1. Stuff that was a lot easier than you thought
  2. Stuff that was pretty much as bad as you thought.
  3. Hell-projects of tar-pits, quicksand, ambush-yaks, and misery.

In the absence of outside pressures, story estimates usually are pitched at type 2 efforts; the honest estimate. Workplace culture introduces biases to this, urging devs to skew one way or the other. Skew 'easier', and you'll end up overshooting your estimates a lot. Skew 'harder' and your velocity will look great, but capacity planning will suffer.

This leads to an interesting back and forth! Dev-team skews harder for estimates. PM sees that team typically exceeds its capacity in productivity, so adds more capacity in later sprints. In theory equilibrium is reached between work-estimation and work-completion-rate. In reality, it means that the trustability of number is kind of low and always will be.

The irreducible complexity

See, the thing is, marketing and sales both need to know when a product will be released so they can kick off marketing campaigns and start warming up the sales funnel. Some kinds of ad-buys are done weeks or more in advance, so slipping product-ship at the last minute can throw off the whole marketing cadence. Trusting in (faulty) numbers means it may look like release will be in 8 weeks, so its safe to start baking that in.

Except those numbers aren't etched in stone. They're graven in the finest of morning dew.

As that 8 week number turns into 6, then 4, then 2, pressure to hit the mark increases. For a company selling on-prem software, you can afford to miss your delivery deadline so long as you have a hotfix/service-pack process in place to deliver stability updates quickly. You see this a lot with game-dev: the shipping installer is 8GB, but there are 2GB of day-1 patches to download before you can play. SaaS products need to work immediately on release, so all-nighters may become the norm for major features tied to marketing campaigns.

Better estimates would make this process a lot more trustable. But, there is little you can do to actually improve estimate quality.

by SysAdmin1138 at May 14, 2017 03:30 PM

May 13, 2017

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – April 2017

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts/topics this month:
  1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on developing security monitoring use cases here!
  2. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009. Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 
  3. This month, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book (now in its 4th edition!) – note that this series is mentioned in some PCI Council materials.
  4. Simple Log Review Checklist Released!” is often at the top of this list – this aging checklist is still a very useful tool for many people. “On Free Log Management Tools” (also aged a bit by now) is a companion to the checklist (updated version)
  5. “SIEM Resourcing or How Much the Friggin’ Thing Would REALLY Cost Me?” is a quick framework for assessing the SIEM project (well, a program, really) costs at an organization (a lot more details on this here in this paper).
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has about 5X of the traffic of this blog]: 
 
Current research on cloud security monitoring:
 
Recent research on security analytics and UBA / UEBA:
 
EPIC WIN post:
 
Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016.

Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Previous post in this endless series:

by Anton Chuvakin (anton@chuvakin.org) at May 13, 2017 12:16 AM

Monthly Blog Round-Up – February 2017

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts/topics this month:
  1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010, so I have no idea why it tops the charts now!) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on developing security monitoring use cases here!
  2. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009. Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 
  3. Simple Log Review Checklist Released!” is often at the top of this list – this aging checklist is still a very useful tool for many people. “On Free Log Management Tools” (also aged a bit by now) is a companion to the checklist (updated version)
  4. This month, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book (now in its 4th edition!) – note that this series is mentioned in some PCI Council materials.
  5. “SIEM Resourcing or How Much the Friggin’ Thing Would REALLY Cost Me?” is a quick framework for assessing the SIEM project (well, a program, really) costs at an organization (a lot more details on this here in this paper).
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has about 5X of the traffic of this blog]: 
Recent research on security analytics and UBA / UEBA:
Miscellaneous fun posts:
(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016.
Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.
Previous post in this endless series:

by Anton Chuvakin (anton@chuvakin.org) at May 13, 2017 12:08 AM

May 11, 2017

Colin Percival

A plan for open source software maintainers

I've been writing open source software for about 15 years now; while I'm still wet behind the ears compared to FreeBSD greybeards like Kirk McKusick and Poul-Henning Kamp, I've been around for long enough to start noticing some patterns. In particular:
  • Free software is expensive. Software is expensive to begin with; but good quality open source software tends to be written by people who are recognized as experts in their fields (partly thanks to that very software) and can demand commensurate salaries.
  • While that expensive developer time is donated (either by the developers themselves or by their employers), this influences what their time is used for: Individual developers like doing things which are fun or high-status, while companies usually pay developers to work specifically on the features those companies need. Maintaining existing code is important, but it is neither fun nor high-status; and it tends to get underweighted by companies as well, since maintenance is inherently unlikely to be the most urgent issue at any given time.
  • Open source software is largely a "throw code over the fence and walk away" exercise. Over the past 15 years I've written freebsd-update, bsdiff, portsnap, scrypt, spiped, and kivaloo, and done a lot of work on the FreeBSD/EC2 platform. Of these, I know bsdiff and scrypt are very widely used and I suspect that kivaloo is not; but beyond that I have very little knowledge of how widely or where my work is being used. Anecdotally it seems that other developers are in similar positions: At conferences I've heard variations on "you're using my code? Wow, that's awesome; I had no idea" many times.
  • I have even less knowledge of what people are doing with my work or what problems or limitations they're running into. Occasionally I get bug reports or feature requests; but I know I only hear from a very small proportion of the users of my work. I have a long list of feature ideas which are sitting in limbo simply because I don't know if anyone would ever use them — I suspect the answer is yes, but I'm not going to spend time implementing these until I have some confirmation of that.
  • A lot of mid-size companies would like to be able to pay for support for the software they're using, but can't find anyone to provide it. For larger companies, it's often easier — they can simply hire the author of the software (and many developers who do ongoing maintenance work on open source software were in fact hired for this sort of "in-house expertise" role) — but there's very little available for a company which needs a few minutes per month of expertise. In many cases, the best support they can find is sending an email to the developer of the software they're using and not paying anything at all — we've all received "can you help me figure out how to use this" emails, and most of us are happy to help when we have time — but relying on developer generosity is not a good long-term solution.
  • Every few months, I receive email from people asking if there's any way for them to support my open source software contributions. (Usually I encourage them to donate to the FreeBSD Foundation.) Conversely, there are developers whose work I would like to support (e.g., people working on FreeBSD wifi and video drivers), but there isn't any straightforward way to do this. Patreon has demonstrated that there are a lot of people willing to pay to support what they see as worthwhile work, even if they don't get anything directly in exchange for their patronage.

May 11, 2017 04:10 AM

May 10, 2017

Villagers with Pitchforks

What You Should Know – Air Conditioning Repair 101

To so many homeowners, owning an air conditioner is not only a luxury but also a huge necessity. This important unit keeps your home cool and comfortable during the extremely hot season. It is, therefore, your responsibility as a homeowner, of ensuring that the air circulating your house remains cool and is of good quality always, by watching out for your air conditioning system. When your AC unit starts to show signs of damage, inefficiency or malfunction, the appropriate repairs ought to be done before the problem escalates into something bigger and much more costly. As a matter of fact, delaying air conditioning repairs will only lead to more problems, costlier damages, jeopardized comfort, higher energy bills, and reduced equipment lifetime. If you own an AC system in your home, below are some pointers you might want to have a look at in our article, air conditioning repair 101: what homeowners should have in mind.

How Your Air Conditioner Works

Your air conditioner cools the air around your home by removing the heat from the warm air. Warm air is sucked from the indoor space and blown through a set of pipes in the AC called the evaporator coils. The evaporator coils are filled with a liquid called a refrigerant. After absorbing heat from the air, this liquid changes state to gas and is then pumped to the condenser component of the air conditioning system. In the condenser, the absorbed heat is released into the outdoor air while the refrigerant goes back into liquid state and the cycle continues. Most air conditioners have a thermostat, which is the device that regulates the amount and temperature of cool air supplied throughout your home.

A Word about Air Filters And Your Air Conditioner

These are found in the ductwork system or other strategic points within the HVAC system. They help in removing particles from the air both to keep the air conditioner system clean and the air supplied from the unit. As the process causes the air filters to get loaded with particles, the air filters may become loaded clogged with debris, which may consequently lead to a reduction in air flow. This can, in turn, pave the way to energy inefficiency, potentially leading to high electricity bills, increased frequency of repairs, and a reduction in the equipment’s lifetime. This means that regular air filter replacement is not only an important aspect of central air conditioning system maintenance but also plays a huge part in preventing the need for air conditioning repairs in the first place.

Costly Air Conditioner Repairs Are Usually Preventable

Some air conditioning repairs such as condenser replacement can be highly expensive. One of the ways you can prevent costly AC repairs is by regularly cleaning your filters, cleaning your condensate drains, and checking out for warning signs of a bad or malfunctioning air conditioning unit. Scheduling annual professional maintenance can also be a helpful way, especially with contractors that a reputed for preventive maintenance services.

Choose Your AC Repair Contractor Wisely

Having as much information as possible on DYI tips and what to do when something goes wrong is overly important. However, knowing how to hire an experienced, professional air conditioning repair contractor is even more important. This is because a reputed HVAC expert is well trained and qualified to handle the specific job without hitches and will be able to detect the most complex and hidden problems. When choosing an HVAC service contractor, always be on the lookout for common scams in the air HVAC industry and be prepared on how to avoid them.

According to the manager of the Altitude Comfort Heating and Air – air conditioning repair Denver division, choosing the right HVAC contractor is the absolute most important thing to focus on when you need HVAC service. “Everything else will take care of itself if you choose a really good company to take care of your problem.”

The post What You Should Know – Air Conditioning Repair 101 appeared first on The Keeping Room.

by Level at May 10, 2017 08:56 PM

How To Become An Air Conditioning Repair Tech In Colorado

Colorado is one of the few states that doesn’t issue licenses for HVAC work. While there’s no state-wide license you need to attain, it’s still a good idea to become certified by a reputable trade school before starting work. Not only will you meet national, city, and county requirements, you’ll also have the ability to prove to your employer and clients that you’ve got the skills necessary to work as an HVAC technician.

Local Requirements

Colorado doesn’t have a state-wide license requirement. This doesn’t mean you can legally perform HVAC work without any certification anywhere in the state. It’s up to the local government of each city and county to determine what licenses are required for aspiring technicians.

Notably, Denver, CO has fairly strict requirements when it comes to HVAC licenses. All journeyman licenses require passing an ICC test in a related subject and several years of work in the field. While these licenses aren’t required to perform work, they’re very highly sought after by employers and potential clients.

Licenses are required to perform work in Colorado Springs and the surrounding parts of El Paso County. Applicants have to pass a background check and demonstrate proficiency by passing an exam. Unlike Denver, you can be fined or jailed if you’re caught performing work without an El Paso license. You can find the specific requirements for the Denver, CO Heating, ventilating, A/C Only Supervisor’s Certificate online.

National Qualifications

Students at trade schools in Colorado often focus on getting a certificate that holds value across the country. Often, the first step involves getting certified under section 608 of the EPA. This certification is mandatory for anyone who works with appliances that contain refrigerants — in other words, you need it to do maintenance on air conditioners, and you’ll definitely have to have it to perform air conditioning repair service.

Additionally, many students prepare for various ICC exams for HVAc technicians. These nationwide standards are the basis for many local, state and national certificates. Mastering the material covered by ICC exams gives you the freedom to become certified almost anywhere with just a little bit of review.

Job Training

If you’ve applied for a job recently, you’ve noticed that more and more companies want to hire people with experience. Trade schools like the Lincoln School of Technology help give you some of that experience. It’s much easier to get your foot in the door and find your first job as an HVAC technician when you’ve spent time in a classroom practicing the skills necessary for basic maintenance and repair. Not only will you get a paper certification, you’ll also be able to prove your skills when you ace tricky interview questions.

Further Experience

Most advanced licenses, like Denver’s journeyman licenses, require that you keep a log detailing your employer, your duties, and roughly how many hours you work. You need a minimum of two to four years of on-the-job experience to apply for these licenses. Once you’ve put in the time, however, these expert certificates will make getting a job easy. Potential employers will know that you’ve put in the time and acquired the skills to truly master your craft.

A Rewarding Career

HVAC technicians enjoy a challenging career with lots of opportunities for advancement. By starting your journey at a technical school, you’ll prove to your employer that you’re serious and enter the job with the skills that you need. This often means higher starting pay, faster advancement, and more freedom when it comes to finding the job you want. If you want to earn good money while helping local community members solve their problems, enroll in a reputable trade school like Lincoln Tech today.

The post How To Become An Air Conditioning Repair Tech In Colorado appeared first on The Keeping Room.

by Level at May 10, 2017 08:55 PM

How To Troubleshoot Your Air Conditioner

Air conditioning units are very much sought after especially throughout summertime when it gets very humid. It can get difficult to rest at night when the temperature rises beyond comfort. But like other home appliances, they can easily break down because of excessive use. Below are ways on how to troubleshoot your air conditioner.

Step 1

If you want to troubleshoot your air conditioner you need to find the best, surest and simplest ways to fix most overlooked problems. If your air conditioner is not working, this is the first step. If your AC is not turning on, you need to check on the thermostat. You may find out that the thermostat as been turned off or even set at a very low temperature that cannot allow your AC to turn on. You must also ensure that your electrical breaker hasn’t been tripped. At this point, you can trip the breaker on and off again to ensure that is in the ON position. There are several good air conditioner troubleshooting charts that can be found online.

Step 2

This is where you inspect the condition of the blower unit to ensure that it is in good working condition. At this point, you check whether the AC is producing excess air flow in the dwelling. The unit is perfectly located in the main AC housing outside of your house and looks exactly like a hamster wheel. Clean the blower unit and check if there is any damaged or bent blade. Any bent or damaged blades can cause inconvenience and ineffectiveness. If you find that you need to buy parts for your air conditioner, you can enter your model and serial number into some websites, and search for parts for the most common repair problems.

Check the duct system that comes from the AC to the vent in your house. These ducts are mostly located at the basement or attic. Ensure that there is no kink, pinched or disconnected duct. Depending on the condition of the AC filter, you can either decide to clean or replace every month. A clogged or a dirty filter will affect the air flow. This is why they require regular maintenance. This AC filter is perfectly located along the duct that returns air form home to the AC.

Step 3

This is a very important step if you want to effectively troubleshoot your air conditioner. Check whether your condenser fan is properly functioning. If it is not functioning, you can try spin the fan by your hand. The AC blades should spin on their own for 5-10 seconds. If blades do not spin on their own, there must be something wrong with the bearings and you’ll be required to replace the motor. Some manufacturers also have troubleshooting guides online for their specific air conditioner models.

Step 4

If your AC’s air isn’t cold enough, you can turn on the unit ad allow the unit to run for 10-15 minutes. Shut the unit off and go to AC unit and locate the copper suction line. Feel the line with your hand, the line should be ice cold. If the suction line is not cold, your system may require more refrigerant. You can contact an expert who can repair and install all the necessary refrigerants and seal all air conditioner leaks. He must be a reputable and a licensed air conditioning repair contractor.

Step 5

It is at this step where you should open the AC housing and check if the compressor and fan are functioning/running. The compressor is found in the main AC and looks like a round canister with a tubing coming out. If the fan is running and the compressor is not, air must be flowing via your home’s vent and it won’t be cold. If the compressor is hot and not running, this means the compressor has broken down, and you require a professional.

The post How To Troubleshoot Your Air Conditioner appeared first on The Keeping Room.

by Level at May 10, 2017 08:55 PM

May 08, 2017

TaoSecurity

Latest Book Inducted into Cybersecurity Canon

Thursday evening Mrs B and I were pleased to attend an awards seminar for the Cybersecurity Canon. This is a project sponsored by Palo Alto Networks and led by Rick Howard. The goal is "identify a list of must-read books for all cybersecurity practitioners."

Rick reviewed my fourth book The Practice of Network Security Monitoring in 2014 and someone nominated it for consideration in 2016. I was unaware earlier this year that my book was part of a 32-title "March Madness" style competition. My book won the five rounds, resulting in its conclusion in the 2017 inductee list! Thank you to all those that voted for my book.

Ben Rothke awarded me the Canon trophy.
Ben Rothke interviewed me prior to the induction ceremony. We discussed some current trends in security and some lessons from the book. I hope to see that interviewed published by Palo Alto Networks and/or the Cybersecurity canon project in the near future.

In my acceptance speech I explained how I wrote the book because I had not yet dedicated a book to my youngest daughter, since she was born after my third book was published.

A teaching moment at Black Hat Abu Dhabi in December 2012 inspired me to write the book. While teaching network security monitoring, one of the students asked "but where do I install the .exe on the server?"

I realized this student had no idea of physical access to a wire, or using a system to collect and store network traffic, or any of the other fundamental concepts inherent to NSM. He thought NSM was another magical software package to install on his domain controller.

Four foreign language editions.
Thanks to the interpretation assistance of a local Arabic speaker, I was able to get through to him. However, the experience convinced me that I needed to write a new book that built NSM from the ground up, hence the selection of topics and the order in which I presented them.

While my book has not (yet?) been translated into Arabic, there are two Chinese language editions, a Korean edition, and a Polish edition! I also know of several SOCs who provide a copy of the book to all incoming analysts. The book is also a text in several college courses.

I believe the book remains relevant for anyone who wants to learn the NSM methodology to detect and respond to intrusions. While network traffic is the example data source used in the book, the NSM methodology is data source agnostic.

In 2002 Bamm Visscher and I defined NSM as "the collection, analysis, and escalation of indications and warnings to detect and respond to intrusions." This definition makes no reference to network traffic.

It is the collection-analysis-escalation framework that matters. You could perform NSM using log files, or host-centric data, or whatever else you use for indications and warning.

I have no plans for another cybersecurity book. I am currently editing a book about combat mindset written by the head instructor of my Krav Maga style and his colleague.
Thanks for asking for an autograph!

Palo Alto hosted a book signing and offered free books for attendees. I got a chance to speak with Steven Levy, whose book Hackers was also inducted. I sat next to him during the book signing, as shown in the picture at right.

Thank you to Palo Alto Networks, Rick Howard, Ben Rothke, and my family for making inclusion in the Cybersecurity Canon possible. The awards dinner was a top-notch event. Mrs B and I enjoyed meeting a variety of people, including students in local cybersecurity degree programs.

I closed my acceptance speech with the following from the end of the Old Testament, at the very end of 2nd Maccabees. It captures my goal when writing books:

"So I too will here end my story. If it is well told and to the point, that is what I myself desired; if it is poorly done and mediocre, that was the best I could do."

If you'd like a copy of The Practice of Network Security Monitoring the best deal is to buy print and electronic editions from the publisher's Web site. Use code NSM101 to save 30%. I like having the print version for easy review, and I carry the digital copy on my tablet and phone.

Thank you to everyone who voted and who also bought a copy of my book!

Update: I forgot to thank Doug Burks, who created Security Onion, the software used to demonstrate NSM in the book. Doug also contributed the appendix explaining certain SO commands. Thank you Doug! Also thank you to Bill Pollack and his team at No Starch Press, who edited and published the book!

by Richard Bejtlich (noreply@blogger.com) at May 08, 2017 03:20 PM

May 07, 2017

Electricmonk.nl

Merging two Python dictionaries by deep-updating

Say we have two Python dictionaries:

{
    'name': 'Ferry',
    'hobbies': ['programming', 'sci-fi']
}

and

{
    'hobbies': ['gaming']
}

What if we want to merge these two dictionaries such that "gaming" is added to the "hobbies" key of the first dictionary? I couldn't find anything online that did this already, so I wrote the following function for it:

# Copyright Ferry Boender, released under the MIT license.
def deepupdate(target, src):
    """Deep update target dict with src
    For each k,v in src: if k doesn't exist in target, it is deep copied from
    src to target. Otherwise, if v is a list, target[k] is extended with
    src[k]. If v is a set, target[k] is updated with v, If v is a dict,
    recursively deep-update it.

    Examples:
    >>> t = {'name': 'Ferry', 'hobbies': ['programming', 'sci-fi']}
    >>> deepupdate(t, {'hobbies': ['gaming']})
    >>> print t
    {'name': 'Ferry', 'hobbies': ['programming', 'sci-fi', 'gaming']}
    """
    for k, v in src.items():
        if type(v) == list:
            if not k in target:
                target[k] = copy.deepcopy(v)
            else:
                target[k].extend(v)
        elif type(v) == dict:
            if not k in target:
                target[k] = copy.deepcopy(v)
            else:
                deepupdate(target[k], v)
        elif type(v) == set:
            if not k in target:
                target[k] = v.copy()
            else:
                target[k].update(v.copy())
        else:
            target[k] = copy.copy(v)

It uses a combination of deepcopy(), updating and self recursion to perform a complete merger of the two dictionaries.

As mentioned in the comment, the above function is released under the MIT license, so feel free to use it any of your programs.

by admin at May 07, 2017 02:56 PM

May 05, 2017

R.I.Pienaar

Talk about Choria and NATS

Recently I was given the opportunity by the NATS.io folk to talk about Choria and NATS on one of their community events. The recording of the talk as well as the slide deck can be found below.

Thanks again for having me NATS.io team!

by R.I. Pienaar at May 05, 2017 05:54 AM

May 04, 2017

OpenSSL

Using TLS1.3 With OpenSSL

The forthcoming OpenSSL 1.1.1 release will include support for TLSv1.3. The new release will be binary and API compatible with OpenSSL 1.1.0. In theory, if your application supports OpenSSL 1.1.0, then all you need to do to upgrade is to drop in the new version of OpenSSL when it becomes available and you will automatically start being able to use TLSv1.3. However there are some issues that application developers and deployers need to be aware of. In this blog post I am going to cover some of those things.

Differences with TLS1.2 and below

TLSv1.3 is a major rewrite of the specification. There was some debate as to whether it should really be called TLSv2.0 - but TLSv1.3 it is. There are major changes and some things work very differently. A brief, incomplete, summary of some things that you are likely to notice follows:

  • There are new ciphersuites that only work in TLSv1.3. The old ciphersuites cannot be used for TLSv1.3 connections.
  • The new ciphersuites are defined differently and do not specify the certificate type (e.g. RSA, DSA, ECDSA) or the key exchange mechanism (e.g. DHE or ECHDE). This has implications for ciphersuite configuration.
  • Clients provide a “key_share” in the ClientHello. This has consequences for “group” configuration.
  • Sessions are not established until after the main handshake has been completed. There may be a gap between the end of the handshake and the establishment of a session (or, in theory, a session may not be established at all). This could have impacts on session resumption code.
  • Renegotiation is not possible in a TLSv1.3 connection
  • More of the handshake is now encrypted.
  • More types of messages can now have extensions (this has an impact on the custom extension APIs and Certificate Transparency)
  • DSA certificates are no longer allowed in TLSv1.3 connections

Note that at this stage only TLSv1.3 is supported. DTLSv1.3 is still in the early days of specification and there is no OpenSSL support for it at this time.

Current status of the TLSv1.3 standard

As of the time of writing TLSv1.3 is still in draft. Periodically a new version of the draft standard is published by the TLS Working Group. Implementations of the draft are required to identify the specific draft version that they are using. This means that implementations based on different draft versions do not interoperate with each other.

OpenSSL 1.1.1 will not be released until (at least) TLSv1.3 is finalised. In the meantime the OpenSSL git master branch contains our development TLSv1.3 code which can be used for testing purposes (i.e. it is not for production use). You can check which draft TLSv1.3 version is implemented in any particular OpenSSL checkout by examining the value of the TLS1_3_VERSION_DRAFT_TXT macro in the tls1.h header file. This macro will be removed when the final version of the standard is released.

In order to compile OpenSSL with TLSv1.3 support you must use the “enable-tls1_3” option to “config” or “Configure”.

Currently OpenSSL has implemented the “draft-20” version of TLSv1.3. Many other libraries are still using older draft versions in their implementations. Notably many popular browsers are using “draft-18”. This is a common source of interoperability problems. Interoperability of the draft-18 version has been tested with BoringSSL, NSS and picotls.

Within the OpenSSL git source code repository there are two branches: “tls1.3-draft-18” and “tls1.3-draft-19”, which implement the older TLSv1.3 draft versions. In order to test interoperability with other TLSv1.3 implementations you may need to use one of those branches. Note that those branches are considered temporary and are likely to be removed in the future when they are no longer needed.

Ciphersuites

OpenSSL has implemented support for five TLSv1.3 ciphersuites as follows:

  • TLS13-AES-256-GCM-SHA384
  • TLS13-CHACHA20-POLY1305-SHA256
  • TLS13-AES-128-GCM-SHA256
  • TLS13-AES-128-CCM-8-SHA256
  • TLS13-AES-128-CCM-SHA256

Of these the first three are in the DEFAULT ciphersuite group. This means that if you have no explicit ciphersuite configuration then you will automatically use those three and will be able to negotiate TLSv1.3.

All the TLSv1.3 ciphersuites also appear in the HIGH ciphersuite alias. The CHACHA20, AES, AES128, AES256, AESGCM, AESCCM and AESCCM8 ciphersuite aliases include a subset of these ciphersuites as you would expect based on their names. Key exchange and authentication properties were part of the ciphersuite definition in TLSv1.2 and below. This is no longer the case in TLSv1.3 so ciphersuite aliases such as ECDHE, ECDSA, RSA and other similar aliases do not contain any TLSv1.3 ciphersuites.

If you explicitly configure your ciphersuites then care should be taken to ensure that you are not inadvertently excluding all TLSv1.3 compatible ciphersuites. If a client has TLSv1.3 enabled but no TLSv1.3 ciphersuites configured then it will immediately fail (even if the server does not support TLSv1.3) with an error message like this:

1
140460646909376:error:141A90B5:SSL routines:ssl_cipher_list_to_bytes:no ciphers available:ssl/statem/statem_clnt.c:3562:No ciphers enabled for max supported SSL/TLS version

Similarly if a server has TLSv1.3 enabled but no TLSv1.3 ciphersuites it will also immediately fail, even if the client does not support TLSv1.3, with an error message like this:

1
140547390854592:error:141FC0B5:SSL routines:tls_setup_handshake:no ciphers available:ssl/statem/statem_lib.c:108:No ciphers enabled for max supported SSL/TLS version

For example, setting a ciphersuite selection string of ECDHE:!COMPLEMENTOFDEFAULT will work in OpenSSL 1.1.0 and will only select those ciphersuites that are in DEFAULT and also use ECDHE for key exchange. However no TLSv1.3 ciphersuites are in the ECDHE group so this ciphersuite configuration will fail in OpenSSL 1.1.1 if TLSv1.3 is enabled.

You may want to explicitly list the TLSv1.3 ciphersuites you want to use to avoid problems. For example:

1
"TLS13-CHACHA20-POLY1305-SHA256:TLS13-AES-128-GCM-SHA256:TLS13-AES-256-GCM-SHA384:ECDHE:!COMPLEMENTOFDEFAULT"

You can test which ciphersuites are included in a given ciphersuite selection string using the openssl ciphers -s -v command:

1
$ openssl ciphers -s -v "ECDHE:!COMPLEMENTOFDEFAULT"

Ensure that at least one ciphersuite supports TLSv1.3

Groups

In TLSv1.3 the client selects a “group” that it will use for key exchange. At the time of writing, OpenSSL only supports ECDHE groups for this (it is possible that DHE groups will also be supported by the time OpenSSL 1.1.1 is actually released). The client then sends “key_share” information to the server for its selected group in the ClientHello.

The list of supported groups is configurable. It is possible for a client to select a group that the server does not support. In this case the server requests that the client sends a new key_share that it does support. While this means a connection will still be established (assuming a mutually supported group exists), it does introduce an extra server round trip - so this has implications for performance. In the ideal scenario the client will select a group that the server supports in the first instance.

In practice most clients will use X25519 or P-256 for their initial key_share. For maximum performance it is recommended that servers are configured to support at least those two groups and clients use one of those two for its initial key_share. This is the default case (OpenSSL clients will use X25519).

The group configuration also controls the allowed groups in TLSv1.2 and below. If applications have previously configured their groups in OpenSSL 1.1.0 then you should review that configuration to ensure that it still makes sense for TLSv1.3. The first named (i.e. most preferred group) will be the one used by an OpenSSL client in its intial key_share.

Applications can configure the group list by using SSL_CTX_set1_groups() or a similar function (see here for further details). Alternatively, if applications use SSL_CONF style configuration files then this can be configured using the Groups or Curves command (see here).

Sessions

In TLSv1.2 and below a session is established as part of the handshake. This session can then be used in a subsequent connection to achieve an abbreviated handshake. Applications might typically obtain a handle on the session after a handshake has completed using the SSL_get1_session() function (or similar). See here for further details.

In TLSv1.3 sessions are not established until after the main handshake has completed. The server sends a separate post-handshake message to the client containing the session details. Typically this will happen soon after the handshake has completed, but it could be sometime later (or not at all).

The specification recommends that applications only use a session once (although this is not enforced). For this reason some servers send multiple session messages to a client. To enforce the “use once” recommendation applications could use SSL_CTX_remove_session() to mark a session as non-resumable (and remove it from the cache) once it has been used.

The old SSL_get1_session() and similar APIs may not operate as expected for client applications written for TLSv1.2 and below. Specifically if a client application calls SSL_get1_session() before the server message containing session details has been received then an SSL_SESSION object will still be returned, but any attempt to resume with it will not succeed and a full handshake will occur instead. In the case where multiple sessions have been sent by the server then only the last session will be returned by SSL_get1_session().

Client application developers should consider using the SSL_CTX_sess_set_new_cb() API instead (see here). This provides a callback mechanism which gets invoked every time a new session is established. This can get invoked multiple times for a single connection if a server sends multiple session messages.

Note that SSL_CTX_sess_set_new_cb() was also available in OpenSSL 1.1.0. Applications that already used that API will still work, but they may find that the callback is invoked at unexpected times, i.e. post-handshake.

An OpenSSL server will immediately attempt to send session details to a client after the main handshake has completed. To server applications this post-handshake stage will appear to be part of the main handshake, so calls to SSL_get1_session() should continue to work as before.

Custom Extensions and Certificate Transparency

In TLSv1.2 and below the initial ClientHello and ServerHello messages can contain “extensions”. This allows the base specifications to be extended with additional features and capabilities that may not be applicable in all scenarios or could not be foreseen at the time that the base specifications were written. OpenSSL provides support for a number of “built-in” extensions.

Additionally the custom extensions API provides some basic capabilities for application developers to add support for new extensions that are not built-in to OpenSSL.

Built on top of the custom extensions API is the “serverinfo” API. This provides an even more basic interface that can be configured at run time. One use case for this is Certificate Transparency. OpenSSL provides built-in support for the client side of Certificate Transparency but there is no built-in server side support. However this can easily be achieved using “serverinfo” files. A serverinfo file containing the Certificate Transparency information can be configured within OpenSSL and it will then be sent back to the client as appropriate.

In TLSv1.3 the use of extensions is expanded significantly and there are many more messages that can include them. Additionally some extensions that were applicable to TLSv1.2 and below are no longer applicable in TLSv1.3 and some extensions are moved from the ServerHello message to the EncryptedExtensions message. The old custom extensions API does not have the ability to specify which messages the extensions should be associated with. For that reason a new custom extensions API was required.

The old API will still work, but the custom extensions will only be added where TLSv1.2 or below is negotiated. To add custom extensions that work for all TLS versions application developers will need to update their applications to the new API (see here for details).

The “serverinfo” data format has also been updated to include additional information about which messages the extensions are relevant to. Applications using “serverinfo” files may need to update to the “version 2” file format to be able to operate in TLSv1.3 (see here and here for details).

Renegotiation

TLSv1.3 does not have renegotiation so calls to SSL_renegotiate() or SSL_renegotiate_abbreviated() will immediately fail if invoked on a connection that has negotiated TLSv1.3.

The most common use case for renegotiation is to update the connection keys. The function SSL_key_update() can be used for this purpose in TLSv1.3 (see https://www.openssl.org/docs/manmaster/man3/SSL_key_update.html).

DSA certificates

DSA certificates are no longer allowed in TLSv1.3. If your server application is using a DSA certificate then TLSv1.3 connections will fail with an error message similar to the following:

1
140348850206144:error:14201076:SSL routines:tls_choose_sigalg:no suitable signature algorithm:ssl/t1_lib.c:2308:

Please use an ECDSA or RSA certificate instead.

Conclusion

TLSv1.3 represents a significant step forward and has some exciting new features but there are some hazards for the unwary when upgrading. Mostly these issues have relatively straight forward solutions. Application developers should review their code and consider whether anything should be updated in order to work more effectively with TLSv1.3. Similarly application deployers should review their configuration.

May 04, 2017 11:00 AM

Michael Biven

Chai 2000 - 2017

I lost my little buddy of 17 years this week.

Chai, I’m happy you had a peaceful morning in the patio at our new house doing what you loved. Sitting under bushes and in the sun. Even in your last day you still comforted us. You were relaxed throughout the morning, when I picked you up and held you as we drove to the vet till the end.

Chai, thank you for everything, we love you and you will be missed.

May 04, 2017 08:04 AM

May 03, 2017

Marios Zindilis

Back-end Validation for Django Model Field Choices

In Django, you can provide a list of values that a field can have when creating a model. For example, here's a minimalistic model for storing musicians:

from django.db import models


class Artist(models.Model):

    TYPE_CHOICES = (
        ('Person', 'Person'),
        ('Group', 'Group'),
        ('Other', 'Other'),)

    name = models.CharField(max_length=100)
    type = models.CharField(max_length=20, choices=TYPE_CHOICES)

Using this code in the Admin interface, as well as in Forms, will make the "Type" field be represented as a drop-down box. Therefore, if all input comes from Admin and Forms, the value of "Type" is expected to be one of those defined in the TYPE_CHOICES tuple. There are 2 things worth noting:

  1. Using choices is a presentation convenience. There is no restriction of the value that can be sent to the database, even coming from the front-end. For example, if you use the browser's inspect feature, you can edit the drop-down list, change one of the values and submit it. The back-end view will then happily consume it and save it to the database.

  2. There is no validation at any stage of the input lifecycle. You may have a specified list of values in the Admin or Forms front-end, but your application may accept input from other sources as well, such as management commands, or bulk imports from fixtures. In none of those cases will a value that is not in the TYPE_CHOICES structure be rejected.

Validation

So let's find a way to fix this by adding some validation in the back-end.

Django has this amazing feature called signals. They allow you to attach extra functionality to some actions, by emitting a signal from that action. For example, in this case we will use the pre_save signal, which gets executed just before a model instance is saved to the database.

Here's the example model, with the signal connected to it:

from django.db import models


class Artist(models.Model):

    TYPE_CHOICES = (
        ('Person', 'Person'),
        ('Group', 'Group'),
        ('Other', 'Other'),)

    name = models.CharField(max_length=100)
    type = models.CharField(max_length=20, choices=TYPE_CHOICES)


models.signals.pre_save.connect(validate_artist_name_choice, sender=Artist)

That last line will call a function named validate_artist_name_choice just before an Artist is saved in the database. Let's write that function:


def validate_artist_name_choice(sender, instance, **kwargs):
    valid_types = [t[0] for t in sender.TYPE_CHOICES]
    if instance.type not in valid_types:
        from django.core.exceptions import ValidationError
        raise ValidationError(
            'Artist Type "{}" is not one of the permitted values: {}'.format(
                instance.type,
               ', '.join(valid_types)))

This function can be anywhere in your code, as long as you can import it in the model. Personally, if a pre_save function is specific to one model, I like to put it in the same module as the model itself, but that's just my preference.

So let's try to create a couple of Artists and see what happens. In this example, my App is called "v".

>>> from v.models import Artist
>>> 
>>> some_artist = Artist(name='Some Artist', type='Person')
>>> some_artist.save()
>>> 
>>> some_other_artist = Artist(name='Some Other Artist', type='Alien Lifeform')
>>> some_other_artist.save()
Traceback (most recent call last):
  File "/usr/lib/python3.5/code.py", line 91, in runcode
    exec(code, self.locals)
  File "<console>", line 1, in <module>
  File "/home/vs/.local/lib/python3.5/site-packages/django/db/models/base.py", line 796, in save
    force_update=force_update, update_fields=update_fields)
  File "/home/vs/.local/lib/python3.5/site-packages/django/db/models/base.py", line 820, in save_base
    update_fields=update_fields)
  File "/home/vs/.local/lib/python3.5/site-packages/django/dispatch/dispatcher.py", line 191, in send
    response = receiver(signal=self, sender=sender, **named)
  File "/home/vs/Tests/validator/v/models.py", line 14, in validate_artist_name_choice
    ', '.join(valid_types)))
django.core.exceptions.ValidationError: ['Artist Type "Alien Lifeform" is not one of the permitted values: Person, Group, Other']

Brilliant! We now have a expressive exception that we can catch in our views or scripts or whatever else we want to use to create model instances. This exception is thrown either when we call the .save() method of a model instance, or when we use the .create() method of a model class. Using the example model, we would get the same exception if we ran:

Artist.objects.create(name='Some Other Artist', type='Alien Lifeform')

Or even:

Artist.objects.get_or_create(name='Some Other Artist', type='Alien Lifeform')

Test It!

Here's a test that triggers the exception:

from django.test import TestCase
from django.core.exceptions import ValidationError
from .models import Artist


class TestArtist(TestCase):
    def setUp(self):
        self.test_artist = Artist(name='Some Artist', type='Some Type')

    def test_artist_type_choice(self):
        with self.assertRaises(ValidationError):
            self.test_artist.save()

Conclusion

Using Django's pre_save functionality, you can validate your application's input in the back-end, and avoid any surprises. Go forth and validate!

May 03, 2017 11:00 PM

Vincent Bernat

VXLAN: BGP EVPN with Cumulus Quagga

VXLAN is an overlay network to encapsulate Ethernet traffic over an existing (highly available and scalable, possibly the Internet) IP network while accomodating a very large number of tenants. It is defined in RFC 7348. For an uncut introduction on its use with Linux, have a look at my “VXLAN & Linux” post.

VXLAN deployment

In the above example, we have hypervisors hosting a virtual machines from different tenants. Each virtual machine is given access to a tenant-specific virtual Ethernet segment. Users are expecting classic Ethernet segments: no MAC restrictions1, total control over the IP addressing scheme they use and availability of multicast.

In a large VXLAN deployment, two aspects need attention:

  1. discovery of other endpoints (VTEPs) sharing the same VXLAN segments, and
  2. avoidance of BUM frames (broadcast, unknown unicast and multicast) as they have to be forwarded to all VTEPs.

A typical solution for the first point is using multicast. For the second point, this is source-address learning.

Introduction to BGP EVPN

BGP EVPN (RFC 7432 and draft-ietf-bess-evpn-overlay for its application to VXLAN) is a standard control protocol to efficiently solves those two aspects without relying on multicast nor source-address learning.

BGP EVPN relies on BGP (RFC 4271) and its MP-BGP extensions (RFC 4760). BGP is the routing protocol powering the Internet. It is highly scalable and interoperable. It is also extensible and one of its extension is MP-BGP. This extension can carry reachability information (NLRI) for multiple protocols (IPv4, IPv6, L3VPN and in our case EVPN). EVPN is a special family to advertise MAC addresses and the remote equipments they are attached to.

There are basically two kinds of reachability information a VTEP sends through BGP EVPN:

  1. the VNIs they have interest in (type 3 routes), and
  2. for each VNI, the local MAC addresses (type 2 routes).

The protocol also covers other aspects of virtual Ethernet segments (L3 reachability information from ARP/ND caches, MAC mobility and multi-homing2) but we won’t describe them here.

To deploy BGP EVPN, a typical solution is to use several route reflectors (both for redundancy and scalability), like in the picture below. Each VTEP opens a BGP session to at least two route reflectors, sends its information (MACs and VNIs) and receives others’. This reduces the number of BGP sessions to configure.

VXLAN deployment with route reflectors

Compared to other solutions to deploy VXLAN, BGP EVPN has three main advantages:

  • interoperability with other vendors (notably Juniper and Cisco),
  • proven scalability (a typical BGP routers handle several millions of routes), and
  • possibility to enforce fine-grained policies.

On Linux, Cumulus Quagga is a fairly complete implementation of BGP EVPN (type 3 routes for VTEP discovery, type 2 routes with MAC or IP addresses, MAC mobility when a host changes from one VTEP to another one) which requires very little configuration.

This is a fork of Quagga and currently used in Cumulus Linux, a network operating system based on Debian powering switches from various brands. At some point, BGP EVPN support will be contributed back to FRR, a community-maintained fork of Quagga3.

It should be noted the BGP EVPN implementation of Cumulus Quagga currently only supports IPv4.

Route reflector setup

Before configuring each VTEP, we need to configure two or more route reflectors. There are many solutions. I will present three of them:

  • using Cumulus Quagga,
  • using GoBGP, an implementation of BGP in Go,
  • using Juniper JunOS.

For reliability purpose, it’s possible (and easy) to use one implementation for some route reflectors and another implementation for the other ones.

The proposed configurations are quite minimal. However, it is possible to centralize policies on the route reflectors (e.g. routes tagged with some community can only be readvertised to some group of VTEPs).

Using Quagga

The configuration is pretty simple. We suppose the configured route reflector has 203.0.113.254 configured as a loopback IP.

router bgp 65000
  bgp router-id 203.0.113.254
  bgp cluster-id 203.0.113.254
  bgp log-neighbor-changes
  no bgp default ipv4-unicast
  neighbor fabric peer-group
  neighbor fabric remote-as 65000
  neighbor fabric capability extended-nexthop
  neighbor fabric update-source 203.0.113.254
  bgp listen range 203.0.113.0/24 peer-group fabric
  !
  address-family evpn
   neighbor fabric activate
   neighbor fabric route-reflector-client
  exit-address-family
  !
  exit
!

A peer group fabric is defined and we leverage the dynamic neighbor feature of Cumulus Quagga: we don’t have to explicitely define each neighbor. Any client from 203.0.113.0/24 and presenting itself as part of AS 65000 can connect. All sent EVPN routes will be accepted and reflected to the other clients.

You don’t need to run Zebra, the route engine talking with the kernel. Instead, start bgpd with the --no_kernel flag.

Using GoBGP

GoBGP is a clean implementation of BGP in Go4. It exposes an RPC API for configuration (but accepts a configuration file and comes with a command-line client).

It doesn’t support dynamic neighbors yet, so you’ll have to use the API, the command-line client or some templating language to automate their declaration. A configuration with only one neighbor is like this:

global:
  config:
    as: 65000
    router-id: 203.0.113.254
    local-address-list:
      - 203.0.113.254
neighbors:
  - config:
      neighbor-address: 203.0.113.1
      peer-as: 65000
    afi-safis:
      - config:
          afi-safi-name: l2vpn-evpn
    route-reflector:
      config:
        route-reflector-client: true
        route-reflector-cluster-id: 203.0.113.254

More neighbors can be added from the command line:

$ gobgp neighbor add 203.0.113.2 as 65000 \
>         route-reflector-client 203.0.113.254 \
>         --address-family evpn

GoBGP won’t try to interact with the kernel which is fine as a route reflector.

Using Juniper JunOS

A variety of Juniper products can be a BGP route reflector, notably:

The main factor is the CPU and the memory. The QFX5100 is low on memory and won’t support large deployments without some additional policing.

Here is a configuration similar to the Quagga one:

interfaces {
    lo0 {
        unit 0 {
            family inet {
                address 203.0.113.254/32;
            }
        }
    }
}

protocols {
    bgp {
        group fabric {
            family evpn {
                signaling {
                    /* Do not try to install EVPN routes */
                    no-install;
                }
            }
            type internal;
            cluster 203.0.113.254;
            local-address 203.0.113.254;
            allow 203.0.113.0/24;
        }
    }
}

routing-options {
    router-id 203.0.113.254;
    autonomous-system 65000;
}

VTEP setup

The next step is to configure each VTEP/hypervisor. Each VXLAN is locally configured using a bridge for local virtual interfaces, like illustrated in the below schema. The bridge is taking care of the local MAC addresses (notably, using source-address learning) and the VXLAN interface takes care of the remote MAC addresses (received with BGP EVPN).

Bridged VXLAN device

VXLANs can be provisioned with the following script. Source-address learning is disabled as we will rely solely on BGP EVPN to synchronize FDBs between the hypervisors.

for vni in 100 200; do
    # Create VXLAN interface
    ip link add vxlan${vni} type vxlan
        id ${vni} \
        dstport 4789 \
        local 203.0.113.2 \
        nolearning
    # Create companion bridge
    brctl addbr br${vni}
    brctl addif br${vni} vxlan${vni}
    brctl stp br${vni} off
    ip link set up dev br${vni}
    ip link set up dev vxlan${vni}
done
# Attach each VM to the appropriate segment
brctl addif br100 vnet10
brctl addif br100 vnet11
brctl addif br200 vnet12

The configuration of Cumulus Quagga is similar to the one used for a route reflector, except we use the advertise-all-vni directive to publish all local VNIs.

router bgp 65000
  bgp router-id 203.0.113.2
  no bgp default ipv4-unicast
  neighbor fabric peer-group
  neighbor fabric remote-as 65000
  neighbor fabric capability extended-nexthop
  neighbor fabric update-source dummy0
  ! BGP sessions with route reflectors
  neighbor 203.0.113.253 peer-group fabric
  neighbor 203.0.113.254 peer-group fabric
  !
  address-family evpn
   neighbor fabric activate
   advertise-all-vni
  exit-address-family
  !
  exit
!

If everything works as expected, the instances sharing the same VNI should be able to ping each other. If IPv6 is enabled on the VMs, the ping command shows if everything is in order:

$ ping -c10 -w1 -t1 ff02::1%eth0
PING ff02::1%eth0(ff02::1%eth0) 56 data bytes
64 bytes from fe80::5254:33ff:fe00:8%eth0: icmp_seq=1 ttl=64 time=0.016 ms
64 bytes from fe80::5254:33ff:fe00:b%eth0: icmp_seq=1 ttl=64 time=4.98 ms (DUP!)
64 bytes from fe80::5254:33ff:fe00:9%eth0: icmp_seq=1 ttl=64 time=4.99 ms (DUP!)
64 bytes from fe80::5254:33ff:fe00:a%eth0: icmp_seq=1 ttl=64 time=4.99 ms (DUP!)

--- ff02::1%eth0 ping statistics ---
1 packets transmitted, 1 received, +3 duplicates, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.016/3.745/4.991/2.152 ms

Verification

Step by step, let’s check how everything comes together.

Getting VXLAN information from the kernel

On each VTEP, Quagga should be able to retrieve the information about configured VXLANs. This can be checked with vtysh:

# show interface vxlan100
Interface vxlan100 is up, line protocol is up
  Link ups:       1    last: 2017/04/29 20:01:33.43
  Link downs:     0    last: (never)
  PTM status: disabled
  vrf: Default-IP-Routing-Table
  index 11 metric 0 mtu 1500
  flags: <UP,BROADCAST,RUNNING,MULTICAST>
  Type: Ethernet
  HWaddr: 62:42:7a:86:44:01
  inet6 fe80::6042:7aff:fe86:4401/64
  Interface Type Vxlan
  VxLAN Id 100
  Access VLAN Id 1
  Master (bridge) ifindex 9 ifp 0x56536e3f3470

The important points are:

  • the VNI is 100, and
  • the bridge device was correctly detected.

Quagga should also be able to retrieve information about the local MAC addresses :

# show evpn mac vni 100
Number of MACs (local and remote) known for this VNI: 2
MAC               Type   Intf/Remote VTEP      VLAN
50:54:33:00:00:0a local  eth1.100
50:54:33:00:00:0b local  eth2.100

BGP sessions

Each VTEP has to establish a BGP session to the route reflectors. On the VTEP, this can be checked by running vtysh:

# show bgp neighbors 203.0.113.254
BGP neighbor is 203.0.113.254, remote AS 65000, local AS 65000, internal link
 Member of peer-group fabric for session parameters
  BGP version 4, remote router ID 203.0.113.254
  BGP state = Established, up for 00:00:45
  Neighbor capabilities:
    4 Byte AS: advertised and received
    AddPath:
      L2VPN EVPN: RX advertised L2VPN EVPN
    Route refresh: advertised and received(new)
    Address family L2VPN EVPN: advertised and received
    Hostname Capability: advertised
    Graceful Restart Capabilty: advertised
[...]
 For address family: L2VPN EVPN
  fabric peer-group member
  Update group 1, subgroup 1
  Packet Queue length 0
  Community attribute sent to this neighbor(both)
  8 accepted prefixes

  Connections established 1; dropped 0
  Last reset never
Local host: 203.0.113.2, Local port: 37603
Foreign host: 203.0.113.254, Foreign port: 179

The output includes the following information:

  • the BGP state is Established,
  • the address family L2VPN EVPN is correctly advertised, and
  • 8 routes are received from this route reflector.

The state of the BGP sessions can also be checked from the route reflectors. With GoBGP, use the following command:

# gobgp neighbor 203.0.113.2
BGP neighbor is 203.0.113.2, remote AS 65000, route-reflector-client
  BGP version 4, remote router ID 203.0.113.2
  BGP state = established, up for 00:04:30
  BGP OutQ = 0, Flops = 0
  Hold time is 9, keepalive interval is 3 seconds
  Configured hold time is 90, keepalive interval is 30 seconds
  Neighbor capabilities:
    multiprotocol:
        l2vpn-evpn:     advertised and received
    route-refresh:      advertised and received
    graceful-restart:   received
    4-octet-as: advertised and received
    add-path:   received
    UnknownCapability(73):      received
    cisco-route-refresh:        received
[...]
  Route statistics:
    Advertised:             8
    Received:               5
    Accepted:               5

With JunOS, use the below command:

> show bgp neighbor 203.0.113.2
Peer: 203.0.113.2+38089 AS 65000 Local: 203.0.113.254+179 AS 65000
  Group: fabric                Routing-Instance: master
  Forwarding routing-instance: master
  Type: Internal    State: Established
  Last State: OpenConfirm   Last Event: RecvKeepAlive
  Last Error: None
  Options: <Preference LocalAddress Cluster AddressFamily Rib-group Refresh>
  Address families configured: evpn
  Local Address: 203.0.113.254 Holdtime: 90 Preference: 170
  NLRI evpn: NoInstallForwarding
  Number of flaps: 0
  Peer ID: 203.0.113.2     Local ID: 203.0.113.254     Active Holdtime: 9
  Keepalive Interval: 3          Group index: 0    Peer index: 2
  I/O Session Thread: bgpio-0 State: Enabled
  BFD: disabled, down
  NLRI for restart configured on peer: evpn
  NLRI advertised by peer: evpn
  NLRI for this session: evpn
  Peer supports Refresh capability (2)
  Stale routes from peer are kept for: 300
  Peer does not support Restarter functionality
  NLRI that restart is negotiated for: evpn
  NLRI of received end-of-rib markers: evpn
  NLRI of all end-of-rib markers sent: evpn
  Peer does not support LLGR Restarter or Receiver functionality
  Peer supports 4 byte AS extension (peer-as 65000)
  NLRI's for which peer can receive multiple paths: evpn
  Table bgp.evpn.0 Bit: 20000
    RIB State: BGP restart is complete
    RIB State: VPN restart is complete
    Send state: in sync
    Active prefixes:              5
    Received prefixes:            5
    Accepted prefixes:            5
    Suppressed due to damping:    0
    Advertised prefixes:          8
  Last traffic (seconds): Received 276  Sent 170  Checked 276
  Input messages:  Total 61     Updates 3       Refreshes 0     Octets 1470
  Output messages: Total 62     Updates 4       Refreshes 0     Octets 1775
  Output Queue[1]: 0            (bgp.evpn.0, evpn)

If a BGP session cannot be established, the logs of each BGP daemon should mention the cause.

Sent routes

From each VTEP, Quagga needs to send:

  • one type 3 route for each local VNI, and
  • one type 2 route for each local MAC address.

The best place to check the received routes is on one of the route reflectors. If you are using JunOS, the following command will display the received routes from the provided VTEP:

> show route table bgp.evpn.0 receive-protocol bgp 203.0.113.2

bgp.evpn.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
  Prefix                  Nexthop              MED     Lclpref    AS path
  2:203.0.113.2:100::0::50:54:33:00:00:0a/304 MAC/IP
*                         203.0.113.2                  100        I
  2:203.0.113.2:100::0::50:54:33:00:00:0b/304 MAC/IP
*                         203.0.113.2                  100        I
  3:203.0.113.2:100::0::203.0.113.2/304 IM
*                         203.0.113.2                  100        I
  3:203.0.113.2:200::0::203.0.113.2/304 IM
*                         203.0.113.2                  100        I

There is one type 3 route for VNI 100 and another one for VNI 200. There are also two type 2 routes for two MAC addresses on VNI 100. To get more information, you can add the keyword extensive. Here is a type 3 route advertising 203.0.113.2 as a VTEP for VNI 1008:

> show route table bgp.evpn.0 receive-protocol bgp 203.0.113.2 extensive

bgp.evpn.0: 11 destinations, 11 routes (11 active, 0 holddown, 0 hidden)
* 3:203.0.113.2:100::0::203.0.113.2/304 IM (1 entry, 1 announced)
     Accepted
     Route Distinguisher: 203.0.113.2:100
     Nexthop: 203.0.113.2
     Localpref: 100
     AS path: I
     Communities: target:65000:268435556 encapsulation:vxlan(0x8)
[...]

Here is a type 2 route announcing the location of the 50:54:33:00:00:0a MAC address for VNI 100:

> show route table bgp.evpn.0 receive-protocol bgp 203.0.113.2 extensive

bgp.evpn.0: 11 destinations, 11 routes (11 active, 0 holddown, 0 hidden)
* 2:203.0.113.2:100::0::50:54:33:00:00:0a/304 MAC/IP (1 entry, 1 announced)
     Accepted
     Route Distinguisher: 203.0.113.2:100
     Route Label: 100
     ESI: 00:00:00:00:00:00:00:00:00:00
     Nexthop: 203.0.113.2
     Localpref: 100
     AS path: I
     Communities: target:65000:268435556 encapsulation:vxlan(0x8)
[...]

With Quagga, you can get a similar output with vtysh:

# show bgp evpn route
BGP table version is 0, local router ID is 203.0.113.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-2 prefix: [2]:[ESI]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]

   Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 203.0.113.2:100
*>i[2]:[0]:[0]:[48]:[50:54:33:00:00:0a]
                    203.0.113.2                   100      0 i
*>i[2]:[0]:[0]:[48]:[50:54:33:00:00:0b]
                    203.0.113.2                   100      0 i
*>i[3]:[0]:[32]:[203.0.113.2]
                    203.0.113.2                   100      0 i
Route Distinguisher: 203.0.113.2:200
*>i[3]:[0]:[32]:[203.0.113.2]
                    203.0.113.2                   100      0 i
[...]

With GoBGP, use the following command:

# gobgp global rib -a evpn | grep rd:203.0.113.2:200
    Network  Next Hop             AS_PATH              Age        Attrs
*>  [type:macadv][rd:203.0.113.2:100][esi:single-homed][etag:0][mac:50:54:33:00:00:0a][ip:<nil>][labels:[100]]203.0.113.2                               00:00:17   [{Origin: i} {LocalPref: 100} {Extcomms: [VXLAN], [65000:268435556]}]
*>  [type:macadv][rd:203.0.113.2:100][esi:single-homed][etag:0][mac:50:54:33:00:00:0b][ip:<nil>][labels:[100]]203.0.113.2                               00:00:17   [{Origin: i} {LocalPref: 100} {Extcomms: [VXLAN], [65000:268435556]}]
*>  [type:macadv][rd:203.0.113.2:200][esi:single-homed][etag:0][mac:50:54:33:00:00:0a][ip:<nil>][labels:[200]]203.0.113.2                               00:00:17   [{Origin: i} {LocalPref: 100} {Extcomms: [VXLAN], [65000:268435656]}]
*>  [type:multicast][rd:203.0.113.2:100][etag:0][ip:203.0.113.2]203.0.113.2                               00:00:17   [{Origin: i} {LocalPref: 100} {Extcomms: [VXLAN], [65000:268435556]}]
*>  [type:multicast][rd:203.0.113.2:200][etag:0][ip:203.0.113.2]203.0.113.2                               00:00:17   [{Origin: i} {LocalPref: 100} {Extcomms: [VXLAN], [65000:268435656]}]

Received routes

Each VTEP should have received the type 2 and type 3 routes from its fellow VTEPs, through the route reflectors. You can check with the show bgp evpn route command of vtysh.

Does Quagga correctly understand the received routes? The type 3 routes are translated to an assocation between the remote VTEPs and the VNIs:

# show evpn vni
Number of VNIs: 2
VNI        VxLAN IF              VTEP IP         # MACs   # ARPs   Remote VTEPs
100        vxlan100              203.0.113.2     4        0        203.0.113.3
                                                                   203.0.113.1
200        vxlan200              203.0.113.2     3        0        203.0.113.3
                                                                   203.0.113.1

The type 2 routes are translated to an association between the remote MACs and the remote VTEPs:

# show evpn mac vni 100
Number of MACs (local and remote) known for this VNI: 4
MAC               Type   Intf/Remote VTEP      VLAN
50:54:33:00:00:09 remote 203.0.113.1
50:54:33:00:00:0a local  eth1.100
50:54:33:00:00:0b local  eth2.100
50:54:33:00:00:0c remote 203.0.113.3

FDB configuration

The last step is to ensure Quagga has correctly provided the received information to the kernel. This can be checked with the bridge command:

# bridge fdb show dev vxlan100 | grep dst
00:00:00:00:00:00 dst 203.0.113.1 self permanent
00:00:00:00:00:00 dst 203.0.113.3 self permanent
50:54:33:00:00:0c dst 203.0.113.3 self
50:54:33:00:00:09 dst 203.0.113.1 self

All good! The two first lines are the translation of the type 3 routes (any BUM frame will be sent to both 203.0.113.1 and 203.0.113.3) and the two last ones are the translation of the type 2 routes.

Interoperability

One of the strength of BGP EVPN is the interoperability with other network vendors. To demonstrate it works as expected, we will configure a Juniper vMX to act as a VTEP.

First, we need to configure the physical bridge9. This is similar to the use of ip link and brctl with Linux. We only configure one physical interface with two old-school VLANs paired with matching VNIs.

interfaces {
    ge-0/0/1 {
        unit 0 {
            family bridge {
                interface-mode trunk;
                vlan-id-list [ 100 200 ];
            }
        }
    }
}
routing-instances {
    switch {
        instance-type virtual-switch;
        interface ge-0/0/1.0;
        bridge-domains {
            vlan100 {
                domain-type bridge;
                vlan-id 100;
                vxlan {
                    vni 100;
                    ingress-node-replication;
                }
            }
            vlan200 {
                domain-type bridge;
                vlan-id 200;
                vxlan {
                    vni 200;
                    ingress-node-replication;
                }
            }
        }
    }
}

Then, we configure BGP EVPN to advertise all known VNIs. The configuration is quite similar to the one we did with Quagga:

protocols {
    bgp {
        group fabric {
            type internal;
            multihop;
            family evpn signaling;
            local-address 203.0.113.3;
            neighbor 203.0.113.253;
            neighbor 203.0.113.254;
        }
    }
}

routing-instances {
    switch {
        vtep-source-interface lo0.0;
        route-distinguisher 203.0.113.3:1; # ❶
        vrf-import EVPN-VRF-VXLAN;
        vrf-target {
            target:65000:1;
            auto;
        }
        protocols {
            evpn {
                encapsulation vxlan;
                extended-vni-list all;
                multicast-mode ingress-replication;
            }
        }
    }
}

routing-options {
    router-id 203.0.113.3;
    autonomous-system 65000;
}

policy-options {
    policy-statement EVPN-VRF-VXLAN {
        then accept;
    }
}

We also need a small compatibility patch for Cumulus Quagga10.

The routes sent by this configuration are very similar to the routes sent by Quagga. The main differences are:

  • on JunOS, the route distinguisher is configured statically (in ❶), and
  • on JunOS, the VNI is also encoded as an Ethernet tag ID.

Here is a type 3 route, as sent by JunOS:

> show route table bgp.evpn.0 receive-protocol bgp 203.0.113.3 extensive

bgp.evpn.0: 13 destinations, 13 routes (13 active, 0 holddown, 0 hidden)
* 3:203.0.113.3:1::100::203.0.113.3/304 IM (1 entry, 1 announced)
     Accepted
     Route Distinguisher: 203.0.113.3:1
     Nexthop: 203.0.113.3
     Localpref: 100
     AS path: I
     Communities: target:65000:268435556 encapsulation:vxlan(0x8)
     PMSI: Flags 0x0: Label 6: Type INGRESS-REPLICATION 203.0.113.3
[...]

Here is a type 2 route:

> show route table bgp.evpn.0 receive-protocol bgp 203.0.113.3 extensive

bgp.evpn.0: 13 destinations, 13 routes (13 active, 0 holddown, 0 hidden)
* 2:203.0.113.3:1::200::50:54:33:00:00:0f/304 MAC/IP (1 entry, 1 announced)
     Accepted
     Route Distinguisher: 203.0.113.3:1
     Route Label: 200
     ESI: 00:00:00:00:00:00:00:00:00:00
     Nexthop: 203.0.113.3
     Localpref: 100
     AS path: I
     Communities: target:65000:268435656 encapsulation:vxlan(0x8)
[...]

We can check that the vMX is able to make sense of the routes it receives from its peers running Quagga:

> show evpn database l2-domain-id 100
Instance: switch
VLAN  DomainId  MAC address        Active source                  Timestamp        IP address
     100        50:54:33:00:00:0c  203.0.113.1                    Apr 30 12:46:20
     100        50:54:33:00:00:0d  203.0.113.2                    Apr 30 12:32:42
     100        50:54:33:00:00:0e  203.0.113.2                    Apr 30 12:46:20
     100        50:54:33:00:00:0f  ge-0/0/1.0                     Apr 30 12:45:55

On the other end, if we look at one of the Quagga-based VTEP, we can check the received routes are correctly understood:

# show evpn vni 100
VNI: 100
 VxLAN interface: vxlan100 ifIndex: 9 VTEP IP: 203.0.113.1
 Remote VTEPs for this VNI:
  203.0.113.3
  203.0.113.2
 Number of MACs (local and remote) known for this VNI: 4
 Number of ARPs (IPv4 and IPv6, local and remote) known for this VNI: 0
# show evpn mac vni 100
Number of MACs (local and remote) known for this VNI: 4
MAC               Type   Intf/Remote VTEP      VLAN
50:54:33:00:00:0c local  eth1.100
50:54:33:00:00:0d remote 203.0.113.2
50:54:33:00:00:0e remote 203.0.113.2
50:54:33:00:00:0f remote 203.0.113.3

Get in touch if you have some success with other vendors!


  1. For example, they may use bridges to connect containers together. 

  2. Such a feature can replace proprietary implementations of MC-LAG allowing several VTEPs to act as a endpoint for a single link aggregation group. This is not needed on our scenario where hypervisors act as VTEPs

  3. The development of Quagga is slow and “closed”. New features are often stalled. FRR is placed under the umbrella of the Linux Foundation, has a GitHub-centered development model and an election process. It already has several interesting enhancements (notably, BGP add-path, BGP unnumbered, MPLS and LDP). 

  4. I am unenthusiastic about projects whose the sole purpose is to rewrite something in Go. However, while being quite young, GoBGP is quite valuable on its own (good architecture, good performance). 

  5. The 48-port version is around $10,000 with the BGP license. 

  6. An empty chassis with a dual routing engine (RE-S-1800X4-16G) is around $30,000. 

  7. I don’t know how pricey the vRR is. For evaluation purposes, it can be downloaded for free if you are a customer. 

  8. The value 100 used in the route distinguishier (203.0.113.2:100) is not the one used to encode the VNI. The VNI is encoded in the route target (65000:268435556), in the 24 least signifiant bits (268435556 & 0xffffff equals 100). As long as VNIs are unique, we don’t have to understand those details. 

  9. For some reason, the use of a virtual switch is mandatory. This is specific to this platform: a QFX doesn’t require this. 

  10. The encoding of the VNI into the route target is being standardized in draft-ietf-bess-evpn-overlay. Juniper already implements this draft. 

by Vincent Bernat at May 03, 2017 09:56 PM

VXLAN & Linux

VXLAN is an overlay network to carry Ethernet traffic over an existing (highly available and scalable) IP network while accommodating a very large number of tenants. It is defined in RFC 7348.

Starting from Linux 3.12, the VXLAN implementation is quite complete as both multicast and unicast are supported as well as IPv6 and IPv4. Let’s explore the various methods to configure it.

VXLAN setup

To illustrate our examples, we use the following setup:

  • an underlay IP network (highly available and scalable, possibly the Internet),
  • three Linux bridges acting as VXLAN tunnel endpoints (VTEP),
  • four servers believing they share a common Ethernet segment.

A VXLAN tunnel extends the individual Ethernet segments accross the three bridges, providing a unique (virtual) Ethernet segment. From one host (e.g. H1), we can reach directly all the other hosts in the virtual segment:

$ ping -c10 -w1 -t1 ff02::1%eth0
PING ff02::1%eth0(ff02::1%eth0) 56 data bytes
64 bytes from fe80::5254:33ff:fe00:8%eth0: icmp_seq=1 ttl=64 time=0.016 ms
64 bytes from fe80::5254:33ff:fe00:b%eth0: icmp_seq=1 ttl=64 time=4.98 ms (DUP!)
64 bytes from fe80::5254:33ff:fe00:9%eth0: icmp_seq=1 ttl=64 time=4.99 ms (DUP!)
64 bytes from fe80::5254:33ff:fe00:a%eth0: icmp_seq=1 ttl=64 time=4.99 ms (DUP!)

--- ff02::1%eth0 ping statistics ---
1 packets transmitted, 1 received, +3 duplicates, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.016/3.745/4.991/2.152 ms

Basic usage

The reference deployment for VXLAN is to use an IP multicast group to join the other VTEPs:

# ip -6 link add vxlan100 type vxlan \
>   id 100 \
>   dstport 4789 \
>   local 2001:db8:1::1 \
>   group ff05::100 \
>   dev eth0 \
>   ttl 5
# brctl addbr br100
# brctl addif br100 vxlan100
# brctl addif br100 vnet22
# brctl addif br100 vnet25
# brctl stp br100 off
# ip link set up dev br100
# ip link set up dev vxlan100

The above commands create a new interface acting as a VXLAN tunnel endpoint, named vxlan100 and put it in a bridge with some regular interfaces1. Each VXLAN segment is associated to a 24-bit segment ID, the VXLAN Network Identifier (VNI). In our example, the default VNI is specified with id 100.

When VXLAN was first implemented in Linux 3.7, the UDP port to use was not defined. Several vendors were using 8472 and Linux took the same value. To avoid breaking existing deployments, this is still the default value. Therefore, if you want to use the IANA-assigned port, you need to explicitely set it with dstport 4789.

As we want to use multicast, we have to specify a multicast group to join (group ff05::100), as well as a physical device (dev eth0). With multicast, the default TTL is 1. If your multicast network leverages some routing, you’ll have to increase the value a bit, like here with ttl 5.

The vxlan100 device acts as a bridge device with remote VTEPs as virtual ports:

  • it sends broadcast, unknown unicast and multicast (BUM) frames to all VTEPs using the multicast group, and
  • it discovers the association from Ethernet MAC addresses to VTEP IP addresses using source-address learning.

The following figure summarizes the configuration, with the FDB of the Linux bridge (learning local MAC addresses) and the FDB of the VXLAN device (learning distant MAC addresses):

Bridged VXLAN device

The FDB of the VXLAN device can be observed with the bridge command. If the destination MAC is present, the frame is sent to the associated VTEP (unicast). The all-zero address is only used when a lookup for the destination MAC fails.

# bridge fdb show dev vxlan100 | grep dst
00:00:00:00:00:00 dst ff05::100 via eth0 self permanent
50:54:33:00:00:0b dst 2001:db8:3::1 self
50:54:33:00:00:08 dst 2001:db8:1::1 self

If you are interested to get more details on how to setup a multicast network and build VXLAN segments on top of it, see my “Network virtualization with VXLAN” article.

Without multicast

Using VXLAN over a multicast IP network has several benefits:

  • automatic discovery of other VTEPs sharing the same multicast group,
  • good bandwidth usage (packets are replicated as late as possible),
  • decentralized and controller-less design2.

However, multicast is not available everywhere and managing it at scale can be difficult. In Linux 3.8, the DOVE extensions have been added to the VXLAN implementation, removing the dependency on multicast.

Unicast with static flooding

We can replace multicast by head-end replication of BUM frames to a statically configured lists of remote VTEPs3:

# ip -6 link add vxlan100 type vxlan \
>   id 100 \
>   dstport 4789 \
>   local 2001:db8:1::1
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 2001:db8:2::1
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 2001:db8:3::1

The VXLAN is defined without a remote multicast group. Instead, all the remote VTEPs are associated with the all-zero address: a BUM frame will be duplicated to all those destinations. The VXLAN device will still learn remote addresses automatically using source-address learning.

It is a very simple solution. With a bit of automation, you can keep the default FDB entries up-to-date easily. However, the host will have to duplicate each BUM frame (head-end replication) as many times as there are remote VTEPs. This is quite reasonable if you have a dozen of them. This may become out-of-hand if you have thousands of them.

Cumulus vxfld daemon is an example of use of this strategy (in the head-end replication mode).

Unicast with static L2 entries

When the associations of MAC addresses and VTEPs are known, it is possible to pre-populate the FDB and disable learning:

# ip -6 link add vxlan100 type vxlan \
>   id 100 \
>   dstport 4789 \
>   local 2001:db8:1::1 \
>   nolearning
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 2001:db8:2::1
# bridge fdb append 00:00:00:00:00:00 dev vxlan100 dst 2001:db8:3::1
# bridge fdb append 50:54:33:00:00:09 dev vxlan100 dst 2001:db8:2::1
# bridge fdb append 50:54:33:00:00:0a dev vxlan100 dst 2001:db8:2::1
# bridge fdb append 50:54:33:00:00:0b dev vxlan100 dst 2001:db8:3::1

Thanks to the nolearning flag, source-address learning is disabled. Therefore, if a MAC is missing, the frame will always be sent using the all-zero entries.

The all-zero entries are still needed for broadcast and multicast traffic (e.g. ARP and IPv6 neighbor discovery). This kind of setup works well to provide virtual L2 networks to virtual machines (no L3 information available). You need some glue to update the FDB entries.

BGP EVPN with Cumulus Quagga is an example of use of this strategy (see “VXLAN: BGP EVPN with Cumulus Quagga” for additional information).

Unicast with static L3 entries

In the previous example, we had to keep the all-zero entries for ARP and IPv6 neighbor discovery to work correctly. However, Linux can answer to neighbor requests on behalf of the remote nodes4. When this feature is enabled, the default entries are not needed anymore (but you could keep them):

# ip -6 link add vxlan100 type vxlan \
>   id 100 \
>   dstport 4789 \
>   local 2001:db8:1::1 \
>   nolearning \
>   proxy
# ip -6 neigh add 2001:db8:ff::11 lladdr 50:54:33:00:00:09 dev vxlan100
# ip -6 neigh add 2001:db8:ff::12 lladdr 50:54:33:00:00:0a dev vxlan100
# ip -6 neigh add 2001:db8:ff::13 lladdr 50:54:33:00:00:0b dev vxlan100
# bridge fdb append 50:54:33:00:00:09 dev vxlan100 dst 2001:db8:2::1
# bridge fdb append 50:54:33:00:00:0a dev vxlan100 dst 2001:db8:2::1
# bridge fdb append 50:54:33:00:00:0b dev vxlan100 dst 2001:db8:3::1

This setup totally eliminates head-end replication. However, protocols relying on multicast won’t work either. With some automation, this is a setup that should work well with containers: if there is a registry keeping a list of all IP and MAC addresses in use, a program could listen to it and adjust the FDB and the neighbor tables.

The VXLAN backend of Docker’s libnetwork is an example of use of this strategy (but it also uses the next method).

Unicast with dynamic L3 entries

Linux can also notify a program an (L2 or L3) entry is missing. The program queries some central registry and dynamically adds the requested entry. However, for L2 entries, notifications are issued only if:

  • the destination MAC address is not known,
  • there is no all-zero entry in the FDB, and
  • the destination MAC address is not a multicast or broadcast one.

Those limitations prevent us to do a “unicast with dynamic L2 entries” scenario.

First, let’s create the VXLAN device with the l2miss and l3miss options5:

ip -6 link add vxlan100 type vxlan \
   id 100 \
   dstport 4789 \
   local 2001:db8:1::1 \
   nolearning \
   l2miss \
   l3miss \
   proxy

Notifications are sent to programs listening to an AF_NETLINK socket using the NETLINK_ROUTE protocol. This socket needs to be bound to the RTNLGRP_NEIGH group. The following is doing exactly that and decodes the received notifications:

# ip monitor neigh dev vxlan100
miss 2001:db8:ff::12 STALE
miss lladdr 50:54:33:00:00:0a STALE

The first notification is about a missing neighbor entry for the requested IP address. We can add it with the following command:

ip -6 neigh replace 2001:db8:ff::12 \
    lladdr 50:54:33:00:00:0a \
    dev vxlan100 \
    nud reachable

The entry is not permanent so that we don’t need to delete it when it expires. If the address becomes stale, we will get another notification to refresh it.

Once the host receives our proxy answer for the neighbor discovery request, it can send a frame with the MAC we gave as destination. The second notification is about the missing FDB entry for this MAC address. We add the appropriate entry with the following command6:

bridge fdb replace 50:54:33:00:00:0a \
    dst 2001:db8:2::1 \
    dev vxlan100 dynamic

The entry is not permanent either as it would prevent the MAC to migrate to the local VTEP (a dynamic entry cannot override a permanent entry).

This setup works well with containers and a global registry. However, there is small latency penalty for the first connections. Moreover, multicast and broadcast won’t be available in the underlay network. The VXLAN backend for flannel, a network fabric for Kubernetes, is an example of this strategy.

Decision

There is no one-size-fits-all solution.

You should consider the multicast solution if:

  • you are in an environment where multicast is available,
  • you are ready to operate (and scale) a multicast network,
  • you need multicast and broadcast inside the virtual segments,
  • you don’t have L2/L3 addresses available beforehand.

The scalability of such a solution is pretty good if you take care of not putting all VXLAN interfaces into the same multicast group (e.g. use the last byte of the VNI as the last byte of the multicast group).

When multicast is not available, another generic solution is BGP EVPN: BGP is used as a controller to ensure distribution of the list of VTEPs and their respective FDBs. As mentioned earlier, an implementation of this solution is Cumulus Quagga. I explore this option in a separate post: VXLAN: BGP EVPN with Cumulus Quagga.

If you operate in a container-like environment where L2/L3 addresses are known beforehand, a solution using static and/or dynamic L2 and L3 entries based on a central registry and no source-address learning would also fit the bill. This provides a more security-tight solution (bound resources, MiTM attacks dampened down, inability to amplify bandwidth usage through excessive broadcast). Various environment-specific solutions are available7 or you can build your own.

Other considerations

Independently of the chosen strategy, here are a few important points to keep in mind when implementing a VXLAN overlay.

Isolation

While you may expect VXLAN interfaces to only carry L2 traffic, Linux doesn’t disable IP processing. If the destination MAC is a local one, Linux will route or deliver the encapsulated IP packet. Check my post about the proper isolation of a Linux bridge.

Encryption

VXLAN enforces isolation between tenants, but the traffic is totally unencrypted. The most direct solution to provide encryption is to use IPsec. Some container-based solutions may come with IPsec support out-of-the box (notably Docker’s libnetwork, but flannel has plan for it too). This is quite important for a deployment over a public cloud.

Overhead

The format of a VXLAN-encapsulated frame is the following:

VXLAN encapsulation

VXLAN adds a fixed overhead of 50 bytes. If you also use IPsec, the overhead depends on many factors. In transport mode, with AES and SHA256, the overhead is 56 bytes. With NAT traversal, this is 64 bytes (additional UDP header). In tunnel mode, this is 72 bytes. See Cisco IPsec Overhead Calculator Tool.

Some users will expect to be able to use an Ethernet MTU of 1500 for the overlay network. Therefore, the underlay MTU should be increased. If it is not possible, ensure the inner MTU (inside the containers or the virtual machines) is correctly decreased8.

IPv6

While all the examples above are using IPv6, the ecosystem is not quite ready yet. The multicast L2-only strategy works fine with IPv6 but every other scenario currently needs some patches (1, 2, 3).

On top of that, IPv6 may not have been implemented in VXLAN-related tools:

Multicast

Linux VXLAN implementation doesn’t support IGMP snooping. Multicast traffic will be broadcasted to all VTEPs unless multicast MAC addresses are inserted into the FDB.


  1. This is one possible implementation. The bridge is only needed if you require some form of source-address learning for local interfaces. Another strategy is to use MACVLAN interfaces. 

  2. The underlay multicast network may still need some central components, like rendez-vous points for PIM-SM protocol. Fortunately, it’s possible to make them highly available and scalable (e.g. with Anycast-RP, RFC 4610). 

  3. For this example and the following ones, a patch is needed for the ip command (to be included in 4.11) to use IPv6 for transport. In the meantime, here is a quick workaround:

    # ip -6 link add vxlan100 type vxlan \
    >   id 100 \
    >   dstport 4789 \
    >   local 2001:db8:1::1 \
    >   remote 2001:db8:2::1
    # bridge fdb append 00:00:00:00:00:00 \
    >   dev vxlan100 dst 2001:db8:3::1
    

  4. You may have to apply an IPv6-related patch to the kernel (to be included in 4.12). 

  5. You have to apply an IPv6-related patch to the kernel (to be included in 4.12) to get appropriate notifications for missing IPv6 addresses. 

  6. Directly adding the entry after the first notification would have been smarter to avoid unnecessary retransmissions. 

  7. flannel and Docker’s libnetwork were already mentioned as they both feature a VXLAN backend. There are also some interesting experiments like BaGPipe BGP for Kubernetes which leverages BGP EVPN and is therefore interoperable with other vendors. 

  8. There is no such thing as MTU discovery on an Ethernet segment. 

by Vincent Bernat at May 03, 2017 09:56 PM

May 01, 2017

Ben's Practical Admin Blog

It’s Nice to be Appreciated

Quite out of the blue I had a parcel arrive on my doorstep while I was on leave. It was notable in that I had received all the parcels I thought I was expecting.

As it turns out the fine administration over at ITPA felt that I do quite a bit to help out this organisation and wanted to thank me with a signed copy of Tom Limoncelli’s tome of The Practice of System and Network Administration.

I feel suitably humble. It’s nice to be appreciated 🙂


by Ben at May 01, 2017 09:20 AM

Electricmonk.nl

ScriptForm v1.3: Webserver that automatically generates forms to serve as frontends to scripts

I've just released v1.3 of Scriptform.

ScriptForm is a stand-alone webserver that automatically generates forms from JSON to serve as frontends to scripts. It takes a JSON file which contains form definitions, constructs web forms from this JSON and serves these to users over HTTP. The user can select a form and fill it out. When the user submits the form, it is validated and the associated script is called. Data entered in the form is passed to the script through the environment.

You can get the new release from the Releases page.

More information, including screenshots, can be found on the Github page.

by admin at May 01, 2017 06:51 AM

April 28, 2017

The Lone Sysadmin

Intel’s Memory Drive Implementation for Optane Guarantees its Doom

A few weeks ago Intel started releasing their Optane product, a commercialization of the 3D Xpoint (Crosspoint) technology they’ve been talking about for a few years. Predictably, there has been a lot of commentary in all directions. Did you know it’s game changing, or that it’s a solution looking for a problem? It’s storage. It […]

The post Intel’s Memory Drive Implementation for Optane Guarantees its Doom appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at April 28, 2017 08:26 PM

April 22, 2017

Pantz.org

DNS over TLS. Secure the DNS!

Secure the Web

A while back we had the "secure the web" initiative, where everyone was inspired to enable encryption (https) on their websites. This was so we could thwart things like eavesdropping and content hijacking. In 2016 about about half of website visits where https. This is great and things seem to be only getting better. ISP's can not see the content in https traffic. Not seeing your traffic content anymore makes them sad. What makes them happy? They can still see all of your DNS requests.

Your ISP can see the websites you visit

Every ISP assigns you some of their DNS servers for you to use when you connect to them for your internet connection. Every time you type in a website name in your browser bar, this request goes to their DNS servers to look up an number called an IP address. After this happens an IP address is returned to your computer, and the connection to the website is made. Your ISP now has a log of the website you requested attached to the rest of the information they have about you. Then they build profiles about you, and sell that info to 3rd parties to target advertising to you in many different ways. Think you'll be slick and switch out their DNS servers with someone else's like Google's free DNS servers (8.8.8.8)? Think again. Any request through your ISP to any DNS server on the internet is unencrypted. Your ISP can slurp up all same requests and get the same info they did just like when you were using their DNS servers. Just like when they could see all of your content with http before https. This also means that your DNS traffic could possibly be intercepted and hijacked by other people or entities. TLS prevents this.

Securing the DNS

The thing that secures https is Transport Layer Security (TLS). It is a set of cryptographic protocols that provides communications security over a computer network. Now that we are beginning to secure the websites I think it is high time we secure the DNS. Others seem to agree. In 2016 rfc7858 and rfc8094 was submitted to Internet Engineering Task Force (IETF) which describes the use of DNS over TLS and DNS over DTLS. Hopefully these will eventually become a standard and all DNS traffic will be more secure in transit. Can you have DNS over TLS today? Yes you can!

Trying DNS over TLS now

DNS over TLS is in its infancy currently, but there are ways to try it out now. You could try using Stubby, a program that acts as a local DNS Privacy stub resolver (using DNS-over-TLS). You will have compile Stubby on Linux or OS X to use it. You could also setup your own DNS server at home and point it to some upstream forwarders that support DNS over TLS. This is what I have done to start testing this. I use Unbound as my local DNS server on my lan for all of my client machines. I used the configuration settings provided by our friends over at Calomel.org to setup my Ubound server to use DNS over TLS. Here is a list of other open source DNS software that can use DNS over TLS. With my Unbound setup, all of my DNS traffic is secured from interception and modification. So how is my testing going?

The early days

Since this is not a IETF standard yet, there are not a lot of providers of DNS over TLS resolvers. I have had to rearrange my list of DNS over TLS providers a few times when some of the servers were just not resolving hostnames. The latency is also higher than using your local ISP's DNS servers or using someones like Googles DNS servers. This is not very noticeable since my local DNS server caches the lookups. I have a feeling the generous providers of these DNS over TLS services are being overwhelmed and can not handle the load of the requests. This is where bigger companies come into play.

Places like Google or OpenDNS do not support DNS over TLS yet, but I'm hoping that they will get on board with this. Google especially since they have been big proponents of making all websites https. They also have the infrastructure to pull this off. Even if someone like Google turned this on, that means they get your DNS traffic instead of your ISP. Will this ever end?

Uggh, people can still see my traffic

Let's face it, if your connected to the internet, at some point someone gets to see where your going. You just have to choose who you want/trust to give this information to. If you point your DNS servers to Google they get to see your DNS requests. If I point my DNS at these test DNS over TLS servers then they get to see my DNS traffic. It seems like the lesser of 2 evils to send your DNS to 2nd party DNS servers then to your ISP. If you use your ISP's DNS servers they know the exact name attached to the IP address they assigned you and the customer that is making the query. I have been holding off telling you the bad news. https SNI will still give up the domain names you visit.

Through all of this even if you point your DNS traffic to a DNS over TLS server your ISP can still see many of the sites you go to. This is thanks to something in https called Server Name Indication (SNI). When you make a connection to an https enabled website there is a process called a handshake. This is the exchange of information before the encryption starts. During this unencrypted handshake (ClientHello) one of the things that is sent by you is a remote host name. This allows the server on the other end to choose appropriate certificate based on the requested host name. This happens when multiple virtual hosts reside on the same server, and this a very common setup. Unfortunately, your ISP can see this, slurp it up, and log this to your account/profile. So now what?

Would a VPN help? Yes, but remember now your DNS queries go to your VPN provider. What is nice is your ISP will not see any of your traffic anymore. That pesky SNI issue mentioned above goes away when using a VPN. But now your trusted endpoint is your VPN provider. They now can log all the sites you go to. So choose wisely when picking a VPN provider. Read their policy on saving logs, choose one that will allow you to pay with Bitcoin so you will be anonymous as possible. With a VPN provider you also have to be careful about DNS leaking. If your VPN client is not configured right, or you forget to turn it on or, any other myriad of ways a VPN can fail, your traffic will go right back to your ISP.

Even VPN's don't make you anonymous

So you have encrypted your DNS and web traffic with TLS and your using a VPN. Good for you, now your privacy is a bit better, but not anonymity. Your still being tracked. This time it is AD networks and services you use. I'm not going to go into this as many other people have written on this topic. Just know that your being tracked one way or another.

I know this all seems hopeless, but securing the web's infrasturcture bit by bit helps improve privacy just a little more. DNS like http is unencrypted. There was a big push to get websites to encrypt their data, now there needs to be the same attention given to DNS.

by Pantz.org at April 22, 2017 09:59 PM

April 19, 2017

That grumpy BSD guy

Forcing the password gropers through a smaller hole with OpenBSD's PF queues

While preparing material for the upcoming BSDCan PF and networking tutorial, I realized that the pop3 gropers were actually not much fun to watch anymore. So I used the traffic shaping features of my OpenBSD firewall to let the miscreants inflict some pain on themselves. Watching logs became fun again.

Yes, in between a number of other things I am currently in the process of creating material for new and hopefully better PF and networking session.

I've been fishing for suggestions for topics to include in the tutorials on relevant mailing lists, and one suggestion that keeps coming up (even though it's actually covered in the existling slides as well as The Book of PF) is using traffic shaping features to punish undesirable activity, such as


What Dan had in mind here may very well end up in the new slides, but in the meantime I will show you how to punish abusers of essentially any service with the tools at hand in your OpenBSD firewall.

Regular readers will know that I'm responsible for maintaining a set of mail services including a pop3 service, and that our site sees pretty much round-the-clock attempts at logging on to that service with user names that come mainly from the local part of the spamtrap addresses that are part of the system to produce our hourly list of greytrapped IP addresses.

But do not let yourself be distracted by this bizarre collection of items that I've maintained and described in earlier columns. The actual useful parts of this article follow - take this as a walkthrough of how to mitigate a wide range of threats and annoyances.

First, analyze the behavior that you want to defend against. In our case that's fairly obvious: We have a service that's getting a volume of unwanted traffic, and looking at our logs the attempts come fairly quickly with a number of repeated attempts from each source address. This similar enough to both the traditional ssh bruteforce attacks and for that matter to Dan's website scenario that we can reuse some of the same techniques in all of the configurations.

I've written about the rapid-fire ssh bruteforce attacks and their mitigation before (and of course it's in The Book of PF) as well as the slower kind where those techniques actually come up short. The traditional approach to ssh bruteforcers has been to simply block their traffic, and the state-tracking features of PF let you set up overload criteria that add the source addresses to the table that holds the addresses you want to block.

I have rules much like the ones in the example in place where there I have a SSH service running, and those bruteforce tables are never totally empty.

For the system that runs our pop3 service, we also have a PF ruleset in place with queues for traffic shaping. For some odd reason that ruleset is fairly close to the HFSC traffic shaper example in The Book of PF, and it contains a queue that I set up mainly as an experiment to annoy spammers (as in, the ones that are already for one reason or the other blacklisted by our spamd).

The queue is defined like this:

   queue spamd parent rootq bandwidth 1K min 0K max 1K qlimit 300

yes, that's right. A queue with a maximum throughput of 1 kilobit per second. I have been warned that this is small enough that the code may be unable to strictly enforce that limit due to the timer resolution in the HFSC code. But that didn't keep me from trying.

And now that I had another group of hosts that I wanted to just be a little evil to, why not let the password gropers and the spammers share the same small patch of bandwidth?

Now a few small additions to the ruleset are needed for the good to put the evil to the task. We start with a table to hold the addresses we want to mess with. Actually, I'll add two, for reasons that will become clear later:

table <longterm> persist counters
table <popflooders> persist counters 

 
The rules that use those tables are:

block drop log (all) quick from <longterm> 


pass in quick log (all) on egress proto tcp from <popflooders> to port pop3 flags S/SA keep state \ 
(max-src-conn 2, max-src-conn-rate 3/3, overload <longterm> flush global, pflow) set queue spamd 

pass in log (all) on egress proto tcp to port pop3 flags S/SA keep state \ 
(max-src-conn 5, max-src-conn-rate 6/3, overload <popflooders> flush global, pflow) 
 
The last one lets anybody connect to the pop3 service, but any one source address can have only open five simultaneous connections and at a rate of six over three seconds.

Any source that trips up one of these restrictions is overloaded into the popflooders table, the flush global part means any existing connections that source has are terminated, and when they get to try again, they will instead match the quick rule that assigns the new traffic to the 1 kilobyte queue.

The quick rule here has even stricter limits on the number of allowed simultaneous connections, and this time any breach will lead to membership of the longterm table and the block drop treatment.

The for the longterm table I already had in place a four week expiry (see man pfctl for detail on how to do that), and I haven't gotten around to deciding what, if any, expiry I will set up for the popflooders.

The results were immediately visible. Monitoring the queues using pfctl -vvsq shows the tiny queue works as expected:

 queue spamd parent rootq bandwidth 1K, max 1K qlimit 300
  [ pkts:     196136  bytes:   12157940  dropped pkts: 398350 bytes: 24692564 ]
  [ qlength: 300/300 ]
  [ measured:     2.0 packets/s, 999.13 b/s ]


and looking at the pop3 daemon's log entries, a typical encounter looks like this:

Apr 19 22:39:33 skapet spop3d[44875]: connect from 111.181.52.216
Apr 19 22:39:33 skapet spop3d[75112]: connect from 111.181.52.216
Apr 19 22:39:34 skapet spop3d[57116]: connect from 111.181.52.216
Apr 19 22:39:34 skapet spop3d[65982]: connect from 111.181.52.216
Apr 19 22:39:34 skapet spop3d[58964]: connect from 111.181.52.216
Apr 19 22:40:34 skapet spop3d[12410]: autologout time elapsed - 111.181.52.216
Apr 19 22:40:34 skapet spop3d[63573]: autologout time elapsed - 111.181.52.216
Apr 19 22:40:34 skapet spop3d[76113]: autologout time elapsed - 111.181.52.216
Apr 19 22:40:34 skapet spop3d[23524]: autologout time elapsed - 111.181.52.216
Apr 19 22:40:34 skapet spop3d[16916]: autologout time elapsed - 111.181.52.216


here the miscreant comes in way too fast and only manages to get five connections going before they're shunted to the tiny queue to fight it out with known spammers for a share of bandwidth.

I've been running with this particular setup since Monday evening around 20:00 CEST, and by late Wednesday evening the number of entries in the popflooders table had reached approximately 300.

I will decide on an expiry policy at some point, I promise. In fact, I welcome your input on what the expiry period should be.

One important takeaway from this, and possibly the most important point of this article, is that it does not take a lot of imagination to retool this setup to watch for and protect against undesirable activity directed at essentially any network service.

You pick the service and the ports it uses, then figure out what are the parameters that determine what is acceptable behavior. Once you have those parameters defined, you can choose to assign to a minimal queue like in this example, block outright, redirect to something unpleasant or even pass with a low probability.

All of those possibilities are part of the normal pf.conf toolset on your OpenBSD system. If you want, you can supplement these mechanisms with a bit of log file parsing that produces output suitable for feeding to pfctl to add to the table of miscreants. The only limits are, as always, the limits of your imagination (and possibly your programming abilities). If you're wondering why I like OpenBSD so much, you can find at least a partial answer in my OpenBSD and you presentation.

FreeBSD users will be pleased to know that something similar is possible on their systems too, only substituting the legacy ALTQ traffic shaping with its somewhat arcane syntax for the modern queues rules in this article.

Will you be attending our PF and networking session in Ottawa, or will you want to attend one elsewhere later? Please let us know at the email address in the tutorial description.



Update 2017-04-23: A truly unexpiring table, and downloadable datasets made available

Soon after publishing this article I realized that what I had written could easily be taken as a promise to keep a collection of POP3 gropers' IP addresses around indefinitely, in a table where the entries never expire.

Table entries do not expire unless you use a pfctl(8) command like the ones mentioned in the book and other resources I referenced earlier in the article, but on the other hand table entries will not survive a reboot either unless you arrange to have table contents stored to somewhere more permanent and restored from there. Fortunately our favorite toolset has a feature that implements at least the restoring part.

Changing the table definition quoted earler to read

 table <popflooders> persist counters file "/var/tmp/popflooders"

takes part of the restoring, and the backing up is a matter of setting up a cron(8) job to dump current contents of the table to the file that will be loaded into the table at ruleset load.

Then today I made another tiny change and made the data available for download. The popflooders table is dumped at five past every full hour to pop3gropers.txt, a file desiged to be read by anything that takes a list of IP addresses and ignores lines starting with the # comment character. I am sure you can think of suitable applications.

In addition, the same script does a verbose dump, including table statistiscs for each entry, to pop3gropers_full.txt for readers who are interested in such things as when an entry was created and how much traffic those hosts produced, keeping in mind that those hosts are not actually blocked here, only subjected to a tiny bandwidth.

As it says in the comment at the top of both files, you may use the data as you please for your own purposes, for any re-publishing or integration into other data sets please contact me via the means listed in the bsdly.net whois record.

As usual I will answer any reasonable requests for further data such as log files, but do not expect prompt service and keep in mind that I am usually in the Central European time zone (CEST at the moment).

I suppose we should see this as a tiny, incremental evolution of the "Cybercrime Robot Torture As A Service" (CRTAAS) concept.

Update 2017-04-29: While the world was not looking, I supplemented the IP address dumps with versions including one with geoiplocation data added and a per country summary based on the geoiplocation data.

Spending a few minutes with an IP address dump like the one described here and whois data is a useful excersise for anyone investigating incidents of this type. This .csv file is based on the 2017-04-29T1105 dump (preserved for reference), and reveals that not only is the majority of attempts from one country but also a very limited number of organizations within that country are responsible for the most active networks.

The spammer blacklist (see this post for background) was of course ripe for the same treatment, so now in addition to the familiar blacklist, that too comes with a geoiplocation annotated version and a per country summary.

Note that all of those files except the .csv file with whois data are products of automatic processes. Please contact me (the email address in the files works) if you have any questions or concerns.

Update 2017-05-17: After running with the autofilling tables for approximately a month, and I must confess, extracting bad login attempts that didn't actually trigger the overload at semi-random but roughly daily intervals, I thought I'd check a few things about the catch. I already knew roughly how many hosts total, but how many were contactin us via IPv6? Let's see:

[Wed May 17 19:38:02] peter@skapet:~$ doas pfctl -t popflooders -T show | wc -l
    5239
[Wed May 17 19:38:42] peter@skapet:~$ doas pfctl -t popflooders -T show | grep -c \:
77

Meaning, that of a total 5239 miscreant trapped, only 77, or just sort of 1.5 per ent tried contacting us via IPv6. The cybercriminals, or at least the literal bottom feeders like the pop3 password gropers, are still behind the times in a number of ways.

Update 2017-06-13: BSDCan 2017 past, and the PF and networking tutorial with OpenBSD session had 19 people signed up for it. We made the slides available on the net here during the presentation and announced them on Twitter and elsewhere just after the session concluded. The revised tutorial was fairly well received, and it is likely that we will be offering roughly equivalent but not identical sessions at future BSD events or other occasions as demand dictates. 

by Peter N. M. Hansteen (noreply@blogger.com) at April 19, 2017 09:45 PM

April 15, 2017

[bc-log]

Using stunnel and TinyProxy to obfuscate HTTP traffic

Recently there has been a lot of coverage in both tech and non-tech news outlets about internet privacy and how to prevent snooping both from service providers and governments. In this article I am going to show one method of anonymizing internet traffic; using a TLS enabled HTTP/HTTPS Proxy.

In this article we will walk through using stunnel to create a TLS tunnel with an instance of TinyProxy on the other side.

How does this help anonymize internet traffic

TinyProxy is an HTTP & HTTPS proxy server. By setting up a TinyProxy instance on a remote server and configuring our HTTP client to use this proxy. We can route all of our HTTP & HTTPS traffic through that remote server. This is a useful technique for getting around network restrictions that might be imposed by ISP's or Governments.

However, it's not enough to simply route HTTP/HTTPS traffic to a remote server. This in itself does not add any additional protection to the traffic. In fact, with an out of the box TinyProxy setup, all of the HTTP traffic to TinyProxy would still be unencrypted, leaving it open to packet capture and inspection. This is where stunnel comes into play.

I've featured it in earlier articles but for those who are new to stunnel, stunnel is a proxy that allows you to create a TLS tunnel between two or more systems. In this article we will use stunnel to create a TLS tunnel between the HTTP client system and TinyProxy.

By using a TLS tunnel between the HTTP client and TinyProxy our HTTP traffic will be encrypted between the local system and the proxy server. This means anyone trying to inspect HTTP traffic will be unable to see the contents of our HTTP traffic.

This technique is also useful for reducing the chances of a man-in-the-middle attack to HTTPS sites.

I say reducing because one of the caveats of this article is, while routing our HTTP & HTTPS traffic through a TLS tunneled HTTP proxy will help obfuscate and anonymize our traffic. The system running TinyProxy is still susceptible to man-in-the-middle attacks and HTTP traffic snooping.

Essentially, with this article, we are not focused on solving the problem of unencrypted traffic, we are simply moving our problem to a network where no one is looking. This is essentially the same approach as VPN service providers, the advantage of running your own proxy is that you control the proxy.

Now that we understand the end goal, let's go ahead and get started with the installation of TinyProxy.

Installing TinyProxy

The installation of TinyProxy is fairly easy and can be accomplished using the apt-get command on Ubuntu systems. Let's go ahead and install TinyProxy on our future proxy server.

server: $ sudo apt-get install tinyproxy

Once the apt-get command finishes, we can move to configuring TinyProxy.

Configuring TinyProxy

By default TinyProxy starts up listening on all interfaces for a connection to port 8888. Since we don't want to leave our proxy open to the world, let's change this by configuring TinyProxy to listen to the localhost interface only. We can do this by modifying the Listen parameter within the /etc/tinyproxy.conf file.

Find:

#Listen 192.168.0.1

Replace With:

Listen 127.0.0.1

Once complete, we will need to restart the TinyProxy service in order for our change to take effect. We can do this using the systemctl command.

server: $ sudo systemctl restart tinyproxy

After systemctl completes, we can validate that our change is in place by checking whether port 8888 is bound correctly using the netstat command.

server: $ netstat -na
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 127.0.0.1:8888          0.0.0.0:*               LISTEN

Based on the netstat output it appears TinyProxy is setup correctly. With that done, let's go ahead and setup stunnel.

Installing stunnel

Just like TinyProxy, the installation of stunnel is as easy as executing the apt-get command.

server: $ sudo apt-get install stunnel

Once apt-get has finished we will need to enable stunnel by editing the /etc/default/stunnel4 configuration file.

Find:

# Change to one to enable stunnel automatic startup
ENABLED=0

Replace:

# Change to one to enable stunnel automatic startup
ENABLED=1

By default on Ubuntu, stunnel is installed in a disabled mode. By changing the ENABLED flag from 0 to 1 within /etc/default/stunnel4, we are enabling stunnel to start automatically. However, our configuration of stunnel does not stop there.

Our next step with stunnel, will involve defining our TLS tunnel.

TLS Tunnel Configuration (Server)

By default stunnel will look in /etc/stunnel for any files that end in .conf. In order to configure our TLS stunnel we will be creating a file named /etc/stunnel/stunnel.conf. Once created, we will insert the following content.

[tinyproxy]
accept = 0.0.0.0:3128
connect = 127.0.0.1:8888
cert = /etc/ssl/cert.pem
key = /etc/ssl/key.pem

The contents of this configuration file are fairly straight forward, but let's go ahead and break down what each of these items mean. We will start with the accept option.

The accept option is similar to the listen option from TinyProxy. This setting will define what interface and port stunnel will listen to for incoming connections. By setting this to 0.0.0.0:3128 we are defining that stunnel should listen on all interfaces on port 3128.

The connect option is used to tell stunnel what IP and port to connect to. In our case this needs to be the IP and port that TinyProxy is listening on; 127.0.0.1:8888.

An easy way to remember how accept and connect should be configured is that accept is where incoming connections should come from, and connect is where they should go to.

Our next two configuration options are closely related, cert & key. The cert option is used to define the location of an SSL certificate that will be used to establish our TLS session. The key option is used to define the location of the key used to create the SSL certificate.

We will set these to be located in /etc/ssl and in the next step, we will go ahead and create both the key and certificate.

Creating a self-signed certificate

The first step in creating a self-signed certificate is to create an private key. To do this we will use the following openssl command.

server: $ sudo openssl genrsa -out /etc/ssl/key.pem 4096

The above will create a 4096 bit RSA key. From this key, we will create a public certificate using another openssl command. During the execution of this command there will be a series of questions. These questions are used to populate the key with organization information. Since we will be using this for our own purposes we will not worry too much about the answers to these questions.

server: $ sudo openssl req -new -x509 -key /etc/ssl/key.pem -out /etc/ssl/cert.pem -days 1826
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:US
State or Province Name (full name) [Some-State]:Arizona
Locality Name (eg, city) []:Phoenix
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Example.com
Organizational Unit Name (eg, section) []:
Common Name (e.g. server FQDN or YOUR name) []:proxy.example.com
Email Address []:proxy@example.com

Once the questions have been answered the openssl command will create our certificate file.

After both the certificate and key have been created, we will need to restart stunnel in order for our changes to take effect.

server: $ sudo systemctl restart stunnel4

At this point we have finished configuring the proxy server. We now need to setup our client.

Setting up stunnel (client)

Like the proxy server, our first step in setting up stunnel on our client is installing it with the apt-get command.

client: $ sudo apt-get install stunnel

We will also once again need to enabling stunnel within the /etc/default/stunnel4 configuration file.

Find:

# Change to one to enable stunnel automatic startup
ENABLED=0

Replace:

# Change to one to enable stunnel automatic startup
ENABLED=1

After enabling stunnel we will need to restart the service with the systemctl command.

client: $ sudo systemctl restart stunnel4

From here we can move on to configuring the stunnel client.

TLS Tunnel Configuration (Client)

The configuration of stunnel in "client-mode" is a little different than the "server-mode" configuration we set earlier. Let's go ahead and add our client configuration into the /etc/stunnel/stunnel.conf file.

client = yes

[tinyproxy]
accept = 127.0.0.1:3128
connect = 192.168.33.10:3128
verify = 4
CAFile = /etc/ssl/cert.pem

As we did before, let's break down the configuration options shown above.

The first option is client, this option is simple, as it defines whether stunnel should be operating in a client or server mode. By setting this to yes, we are defining that we would like to use client mode.

We covered accept and connect before and if we go back to our description above we can see that stunnel will accept connections on 127.0.0.1:3128 and then tunnel them to 192.168.33.10:3128, which is the IP and port that our stunnel proxy server is listening on.

The verify option is used to define what level of certificate validation should be performed. The option of 4 will cause stunnel to verify the remote certificate with a local certificate defined with the CAFile option. In the above example, I copied the /etc/ssl/cert.pem we generated on the server to the client and set this as the CAFile.

These last two options are important, without setting verify and CAFile stunnel will open an TLS connection without necessarily checking the validity of the certificate. By setting verify to 4 and CAFile to the same cert.pem we generated earlier, we are giving stunnel a way to validate the identity of our proxy server. This will prevent our client from being hit with a man-in-the-middle attack.

Once again, let's restart stunnel to make our configurations take effect.

client: $ sudo systemctl restart stunnel4

With our configurations complete, let's go ahead and test our proxy.

Testing our TLS tunneled HTTP Proxy

In order to test our proxy settings we will use the curl command. While we are using a command line web client in this article, it is possible to use this same type of configuration with GUI based browsers such as Chrome or Firefox.

Before testing however, I will need to set the http_proxy and https_proxy environmental variables. These will tell curl to leverage our proxy server.

client: $ export http_proxy="http://localhost:3128"
client: $ export https_proxy="https://localhost:3128"

With our proxy server settings in place, let's go ahead and execute our curl command.

client: $ curl -v https://google.com
* Rebuilt URL to: https://google.com/
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 3128 (#0)
* Establish HTTP proxy tunnel to google.com:443
> CONNECT google.com:443 HTTP/1.1
> Host: google.com:443
> User-Agent: curl/7.47.0
> Proxy-Connection: Keep-Alive
>
< HTTP/1.0 200 Connection established
< Proxy-agent: tinyproxy/1.8.3
<
* Proxy replied OK to CONNECT request
* found 173 certificates in /etc/ssl/certs/ca-certificates.crt
* found 692 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_ECDSA_AES_128_GCM_SHA256
*    server certificate verification OK
*    server certificate status verification SKIPPED
*    common name: *.google.com (matched)
*    server certificate expiration date OK
*    server certificate activation date OK
*    certificate public key: EC
*    certificate version: #3
*    subject: C=US,ST=California,L=Mountain View,O=Google Inc,CN=*.google.com
*    start date: Wed, 05 Apr 2017 17:47:49 GMT
*    expire date: Wed, 28 Jun 2017 16:57:00 GMT
*    issuer: C=US,O=Google Inc,CN=Google Internet Authority G2
*    compression: NULL
* ALPN, server accepted to use http/1.1
> GET / HTTP/1.1
> Host: google.com
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Location: https://www.google.com/
< Content-Type: text/html; charset=UTF-8
< Date: Fri, 14 Apr 2017 22:37:01 GMT
< Expires: Sun, 14 May 2017 22:37:01 GMT
< Cache-Control: public, max-age=2592000
< Server: gws
< Content-Length: 220
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
< Alt-Svc: quic=":443"; ma=2592000; v="37,36,35"
<
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>
* Connection #0 to host localhost left intact

From the above output we can see that our connection was routed through TinyProxy.

< Proxy-agent: tinyproxy/1.8.3

And we were able to connect to Google.

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="https://www.google.com/">here</A>.
</BODY></HTML>

Given these results, it seems we have a working TLS based HTTP/HTTPS proxy. Since this proxy is exposed on the internet what would happen if this proxy was found by someone simply scanning subnets for nefarious purposes.

Securing our tunnel further with PreShared Keys

In theory as it stands today they could use our proxy for their own purposes. This means we need some way to ensure that only our client can use this proxy; enter PreShared Keys.

Much like an API token, stunnel supports an authentication method called PSK or PreShared Keys. This is essentially what it sounds like. A token that has been shared between the client and the server in advance and used for authentication. To enable PSK authentication we simply need to add the following two lines to the /etc/stunnel/stunnel.conf file on both the client and the server.

ciphers = PSK
PSKsecrets = /etc/stunnel/secrets

By setting ciphers to PSK we are telling stunnel to use PSK based authentication. The PSKsecrets option is used to provide stunnel a file that contains the secrets in a clientname:token format.

In the above we specified the /etc/stunnel/secrets file. Let's go ahead and create that file and enter a sample PreShared Key.

client1:SjolX5zBNedxvhj+cQUjfZX2RVgy7ZXGtk9SEgH6Vai3b8xiDL0ujg8mVI2aGNCz

Since the /etc/stunnel/secrets file contains sensitive information. Let's ensure that the permissions on the file are set appropriately.

$ sudo chmod 600 /etc/stunnel/secrets

By setting the permissions to 600 we are ensuring only the root user (the owner of the file) can read this file. These permissions will prevent other users from accessing the file and stumbling across our authentication credentials.

After setting permissions we will need to once again restart the stunnel service.

$ sudo systemctl restart stunnel4

With our settings are complete on both the client and server, we now have a reasonably secure TLS Proxy service.

Summary

In this article we covered setting up both a TLS and HTTP Proxy. As I mentioned before this setup can be used to help hide HTTP and HTTPS traffic on a given network. However, it is important to remember that while this setup makes the client system a much more tricky target, the proxy server itself could still be targeted for packet sniffing and man-in-the-middle attacks.

The key is to wisely select the location of where the proxy server is hosted.

Have feedback on the article? Want to add your 2 cents? Shoot a tweet out about it and tag @madflojo.


Posted by Benjamin Cane

April 15, 2017 07:30 AM

April 13, 2017

Feeding the Cloud

Automatically renewing Let's Encrypt TLS certificates on Debian using Certbot

I use Let's Encrypt TLS certificates on my Debian servers along with the Certbot tool. Since I use the "temporary webserver" method of proving domain ownership via the ACME protocol, I cannot use the cert renewal cronjob built into Certbot.

Instead, this is the script I put in /etc/cron.daily/certbot-renew:

#!/bin/bash

/usr/bin/certbot renew --quiet --pre-hook "/bin/systemctl stop apache2.service" --post-hook "/bin/systemctl start apache2.service"

pushd /etc/ > /dev/null
/usr/bin/git add letsencrypt ejabberd
DIFFSTAT="$(/usr/bin/git diff --cached --stat)"
if [ -n "$DIFFSTAT" ] ; then
    /usr/bin/git commit --quiet -m "Renewed letsencrypt certs"
    echo "$DIFFSTAT"
fi
popd > /dev/null

# Generate the right certs for ejabberd and znc
if test /etc/letsencrypt/live/jabber-gw.fmarier.org/privkey.pem -nt /etc/ejabberd/ejabberd.pem ; then
    cat /etc/letsencrypt/live/jabber-gw.fmarier.org/privkey.pem /etc/letsencrypt/live/jabber-gw.fmarier.org/fullchain.pem > /etc/ejabberd/ejabberd.pem
fi
cat /etc/letsencrypt/live/irc.fmarier.org/privkey.pem /etc/letsencrypt/live/irc.fmarier.org/fullchain.pem > /home/francois/.znc/znc.pem

It temporarily disables my Apache webserver while it renews the certificates and then only outputs something to STDOUT (since my cronjob will email me any output) if certs have been renewed.

Since I'm using etckeeper to keep track of config changes on my servers, my renewal script also commits to the repository if any certs have changed.

Finally, since my XMPP server and IRC bouncer need the private key and the full certificate chain to be in the same file, so I regenerate these files at the end of the script. In the case of ejabberd, I only do so if the certificates have actually changed since overwriting ejabberd.pem changes its timestamp and triggers an fcheck notification (since it watches all files under /etc).

External Monitoring

In order to catch mistakes or oversights, I use ssl-cert-check to monitor my domains once a day:

ssl-cert-check -s fmarier.org -p 443 -q -a -e francois@fmarier.org

I also signed up with Cert Spotter which watches the Certificate Transparency log and notifies me of any newly-issued certificates for my domains.

In other words, I get notified:

  • if my cronjob fails and a cert is about to expire, or
  • as soon as a new cert is issued.

The whole thing seems to work well, but if there's anything I could be doing better, feel free to leave a comment!

April 13, 2017 03:00 PM

Raymii.org

Distributed load testing with Tsung

At $dayjob I manage a large OpenStack Cloud. Next to that I also build high-performance and redundant clusters for customers. Think multiple datacenters, haproxy, galera or postgres or mysql replication, drbd with nfs or glusterfs and all sorts of software that can (and sometimes cannot) be clustered (redis, rabbitmq etc.). Our customers deploy their application on there and when one or a few components fail, their application stays up. Hypervisors, disks, switches, routers, all can fail without actual service downtime. Next to building such clusters, we also monitor and manage them. When we build such a cluster (fully automated with Ansible) we do a basic load test. We do this not for benchmarking or application flow testing, but to optimize the cluster components. Simple things like the mpm workers or threads in Apache or more advanced topics like MySQL or DRBD. Optimization there depends on the specifications of the servers used and the load patterns. Tsung is a high-performance but simple to configure and use piece of software written in Erlang. Configuration is done in a simple readable XML file. Tsung can be run distributed as well for large setups. It has good reporting and a live web interface for status and reports during a test.

April 13, 2017 12:00 AM

April 11, 2017

syslog.me

Exploring Docker overlay networks

Docker In the past months I have made several attempts to explore Docker overlay networks, but there were a few pieces to set up before I could really experiment and… well, let’s say that I have probably approached the problem the wrong way and wasted some time along the way. Not again. I have set aside some time and worked agile enough to do the whole job, from start to finish. Nowadays there is little point in creating overlay networks by hand, except that it’s still a good learning experience. And a learning experience with Docker and networking was exactly what I was after.

When I started exploring multi-host Docker networks, Docker was quite different than it is now. In particular, Docker Swarm didn’t exist yet, and there was a certain amount of manual work required in order to create an overlay network, so that containers located in different hosts can communicate.

Before Swarm, in order to set up an overlay network one needed to:

  • have at least two docker hosts to establish an overlay network;
  • have a supported key/value store available for the docker hosts to sync information;
  • configure the docker hosts to use the key/value store;
  • create an overlay network on one of the docker host; if everything worked well, the network will “propagate” to the other docker hosts that had registered with the key/value store;
  • create named containers on different hosts; then try and ping each other using the names: if everything was done correctly, you would be able to ping the containers through the overlay network.

This looks like simple high-level checklist. I’ll now describe the actual steps needed to get this working, leaving the details of my failuers to the last section of this post.

Overlay networks for the impatient

A word of caution

THIS SETUP IS COMPLETELY INSECURE. Consul interfaces are exposed on the public interface of the host with no authentication. Consul communication is not encrypted. The traffic over the overlay network itself is not encrypted nor protected in any way. Using this set-up in any real-life environment is strongly discouraged.

Hardware set-up

I used three hardware machines, all running Debian Linux “Jessie”; all machines got a dynamic IPv4 address from DHCP

  • murray: 192.168.100.10, running consul in server mode;
  • tyrrell: 192.168.100.13, docker host on a 4.9 Linux kernel installed from backports;
  • brabham: 192.168.100.15, docker host, same configuration as tyrrell.

Running the key-value store

create a directory structure, e.g.:

mkdir consul
cd consul
mkdir bin config data

Download consul and unpack it in the bin subdirectory.

Finally, download the start-consul-master.sh script from my github account into the consul directory you created, make it executable and run it:

chmod a+x start-consul-master.sh
./start-consul-master.sh

If everything goes well, you’ll now have a consul server running on your machine. If not, you may have something to tweak in the script. In that case, please refer to the comments in the script itself.

On murray, the script started consul successfully. These are the messages I got on screen:

bronto@murray:~/Downloads/consul$ ./start-consul-master.sh 
Default route on interface wlan0
The IPv4 address on wlan0 is 192.168.100.10
The data directory for consul is /home/bronto/Downloads/consul/data
The config directory for consul is /home/bronto/Downloads/consul/config
Web UI available at http://192.168.100.10:8500/ui
DNS server available at 192.168.100.10 port 8600 (TCP/UDP)
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v0.8.0'
           Node ID: 'eeaf4121-262a-4799-a175-d7037d20695f'
         Node name: 'consul-master-murray'
        Datacenter: 'dc1'
            Server: true (bootstrap: true)
       Client Addr: 192.168.100.10 (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 192.168.100.10 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: 

==> Log data will now stream in as it occurs:

    2017/04/11 14:53:56 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:192.168.100.10:8300 Address:192.168.100.10:8300}]
    2017/04/11 14:53:56 [INFO] raft: Node at 192.168.100.10:8300 [Follower] entering Follower state (Leader: "")
    2017/04/11 14:53:56 [INFO] serf: EventMemberJoin: consul-master-murray 192.168.100.10
    2017/04/11 14:53:56 [WARN] serf: Failed to re-join any previously known node
    2017/04/11 14:53:56 [INFO] consul: Adding LAN server consul-master-murray (Addr: tcp/192.168.100.10:8300) (DC: dc1)
    2017/04/11 14:53:56 [INFO] serf: EventMemberJoin: consul-master-murray.dc1 192.168.100.10
    2017/04/11 14:53:56 [WARN] serf: Failed to re-join any previously known node
    2017/04/11 14:53:56 [INFO] consul: Handled member-join event for server "consul-master-murray.dc1" in area "wan"
    2017/04/11 14:54:03 [ERR] agent: failed to sync remote state: No cluster leader
    2017/04/11 14:54:06 [WARN] raft: Heartbeat timeout from "" reached, starting election
    2017/04/11 14:54:06 [INFO] raft: Node at 192.168.100.10:8300 [Candidate] entering Candidate state in term 3
    2017/04/11 14:54:06 [INFO] raft: Election won. Tally: 1
    2017/04/11 14:54:06 [INFO] raft: Node at 192.168.100.10:8300 [Leader] entering Leader state
    2017/04/11 14:54:06 [INFO] consul: cluster leadership acquired
    2017/04/11 14:54:06 [INFO] consul: New leader elected: consul-master-murray
    2017/04/11 14:54:07 [INFO] agent: Synced node info

Note that the consul web UI is available (on murray, it was on http://192.168.100.10:8500/ui as you can read from the output of the script). I suggest that you take a look at it while you progress in this procedure; in particular, look at how the objects in the KEY/VALUE section change when the docker hosts join the cluster and when the overlay network is created.

Using the key/value store in docker engine

Since my set-up was not permanent (I just wanted to test out the overlay networks but I don’t mean to have a node always on and exposing the key/value store), I did only a temporary change in Docker Engine’s configuration by stopping the system service and then running the docker daemon by hand. That’s not something you would do in production. If you are interested in making the change permanent, please look at the wonderful article in the “Technical scratchpad” blog.

On both docker hosts I ran:

systemctl stop docker.service

Then on brabham I ran:

/usr/bin/dockerd -H unix:///var/run/docker.sock --cluster-store consul://192.168.100.10:8500 --cluster-advertise=192.168.100.15:0

while on tyrrell I ran:

/usr/bin/dockerd -H unix:///var/run/docker.sock --cluster-store consul://192.168.100.10:8500 --cluster-advertise=192.168.100.13:0

If everything goes well, messages on screen will confirm that you have successully joined consul. E.g.: this was on brabham:

INFO[0001] 2017/04/10 19:55:44 [INFO] serf: EventMemberJoin: brabham 192.168.100.15

This one was also on brabham, letting us know that tyrrell had also joined the family:

INFO[0058] 2017/04/10 19:56:41 [INFO] serf: EventMemberJoin: tyrrell 192.168.100.13

The parameter --cluster-advertise is quite important. If you don’t include it, the creation of the overlay network at the following step will succeed, but containers in different hosts will fail to communicate through the overlay network, which will make the network itself pretty pointless.

Creating the overlay network

On one of the docker hosts, create an overlay network. In my case, I ran the following command on tyrrell to create the network 10.17.0.0/16 to span both hosts, and name it bronto-overlay:

$ docker network create -d overlay --subnet=10.17.0.0/16 bronto-overlay
c4bcd49e4c0268b9fb22c7f68619562c6a030947c8fe5ef2add5abc9446e9415

The output, a long network ID, looks promising. Let’s verify that the network is actually created:

$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
3277557c8f42        bridge              bridge              local
80dfa291f573        bronto-bridged      bridge              local
c4bcd49e4c02        bronto-overlay      overlay             global
dee1b78004ef        host                host                local
f1fb0c7be326        none                null                local

Promising indeed: there is a bronto-overlay network and the scope is global. If the command really succeeded, we should find it on brabham, too:

$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
90221b0a57e1        bridge              bridge              local
c4bcd49e4c02        bronto-overlay      overlay             global
3565b28a327b        docker_gwbridge     bridge              local
f4691f3066cb        host                host                local
86bdd8ab9b90        none                null                local

There it is. Good.

Test that everything works

With the network set up, I should be able to create one container on each host, bound to the overlay network, and they should be able to ping each other. Not only that: they should be able to resolve each other’s names through Docker’s DNS service and despite being on different machines. Let’s see.

On each machine I ran:

docker run --rm -it --network bronto-overlay --name in-$HOSTNAME debian /bin/bash

That will create a container named in-tyrrell on tyrrell and a container named in-brabham on brabham. Their main network interface will appear to be bound to the overlay network. E.g. on in-tyrrell:

root@d86df4662d7e:/# ip addr show dev eth0
68: eth0@if69: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default 
    link/ether 02:42:0a:11:00:03 brd ff:ff:ff:ff:ff:ff
    inet 10.17.0.3/16 scope global eth0
       valid_lft forever preferred_lft forever

We now try to ping the other container:

root@d86df4662d7e:/# ping in-brabham
PING in-brabham (10.17.0.2): 56 data bytes
64 bytes from 10.17.0.2: icmp_seq=0 ttl=64 time=0.857 ms
64 bytes from 10.17.0.2: icmp_seq=1 ttl=64 time=0.618 ms
^C--- in-brabham ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.618/0.738/0.857/0.120 ms
root@d86df4662d7e:/#

Success!

How things really went

Things weren’t so straightforward in reality.

I will use a consul container. Or not.

When I was designing my experiment I thought I would run consul through a consul container in one of the two hosts. I thus spent some time playing with the progrium/consul container, as indicated by Docker’s documentation, until I found out that it wasn’t being updated since two years… You can find the outcast of my experiment with progrium/consul on github.

I thus switched to Docker’s official consul image. It was similar enough to progrium/consul to seem an easy port, and different enough that it made me hit the wall a few times before I got things right with that image, too. With the help of a couple of scripts I could now easily start a server container and a client container. You can find the outcast of my experiments with docker’s consul image on github, too.

Then, I understood that the whole thing was pointless. You don’t need to have a consul client running on the docker host, because the docker engine itself will be the client. And you can’t have the consul server running as a container on one of the docker hosts. In fact, the docker engine itself needs to register at start-up with the consul server, but the consul server won’t be running because docker hasn’t started… ouch!

That wasn’t a complete waste of time anyway. In fact, with the experience acquired in making consul containers work and the scripts that I had already written, it was an easy task to configure a consul server on a separate laptop (murray). The script that I used to run the consul server is also available on github.

The documentation gap

When I started playing with docker, Docker Swarm didn’t exist yet, or was still in a very primitive stage. The only way to make containers on different hosts communicate were overlay networks and there was no automated procedure to create them, therefore Docker’s documentation explained in some details how they should be set up. With the release of Swarm, overlay networks are created automatically by Swarm itself and not much about the “traditional” set up of the overlay networks was left in the documentation. Walking through the “time machine” in the docs web page proved to be problematic, but luckily enough I found this article on the Technical Scratchpad blog that helped me to connect the dots.

Don’t believe in magic

When I initially started the docker engine and bound it to the consul server I left out the --cluster-advertise option. All in all, docker must be smart enough to guess the right interface for that, right?

No, wrong.

When I did leave the option out, everything seemed to work. In particular, when I created the overlay network on tyrrell I could also see it on brabham. And when I created a container on each host, docker network inspect bronto-overlay showed that both containers were registered on the network. Except that they couldn’t communicate. The problem seems to be known, and things started to work only when I added the --cluster-advertise option with a proper value for each host.


Tagged: consul, Debian, docker, linux, networking

by bronto at April 11, 2017 02:42 PM

Sarah Allen

the making of a fireplace

I built a fake fireplace in celebration of our team shipping Cloud Functions for Firebase. The codename was “hearth” — the product was almost ready for public beta by the time I joined the team, yet I know it was an epic journey to create a long-awaited feature and I felt the milestone deserved dramatic punctuation.

Red brick surrounds a monitor with video of log fire, wooden top has rows of whiskey bottles and a card

If you would like to build your very own fake fireplace, here’s how…

Initial Research

I reached out to our helpful facilities staff to make sure that I understood the constraints of the space. (I didn’t want to buy or create something that would be immediately taken down!) James Miller from facilities investigated some options:

“Since a hearth is technically the flat stone floor or the part outside of the fireplace, putting up just a fireplace and mantel leaves out the most important part, but including it presents a tripping hazard.” He suggested a quick and easy solution would be to temporarily change the display play a 8hr yule log video or on extra computer monitor with some fake foam bricks. He noted that some prop fireplaces which look pretty good in the photos, but are just cardboard. There are more expensive electronic fireplaces, but we can’t have anything that includes heating features similar to a space heater, and he reported “most that I’ve seen online are also missing the most important part… the actual hearth.”

Initially I felt that a literal interpretation of the “hearth” code name was not a critical element. We explored buying a real fake fireplace, but the ones we found were very small and we felt they were very expensive relative to the potential impact. However, his thoughts on the meaning of “hearth” later led to some important design elements. I love being involved in a collaborative creative process!

James suggested that something in a stage prop style would have dramatic effect, perhaps using styrofoam so that it would be light and easy to move, if we needed to shift it in the future around team moves. In brainstorming location, he helped me remember to pick out a spot with an electrical outlet.

Measure Twice, Cut Once

an aisle with people at desks on either side, at the end is a small bookshelf with bottles of alcohol and above that a large monitor on the wallI surreptitiously took a photo of the area, and arrived early one morning with a tape measure to determine its maximum dimension.I then measured the computer monitor I planned to use to determine the minimum internal dimension.

There was actually a third, very important measurement: the internal size of my car, which later became a critical optimization. I designed the fireplace in three pieces, so it could fit into the car, which also made it much lighter and easier to transport. Styrofoam is rather fragile, and the smaller pieces fit easily through doors.

Tools & Supplies

Tools: tape measure, hammer, screw driver, propane torch, safety googles, mask, electric drill, electric sander, garden shears, mat knife, level, steel square

Supplies: small nails, wood, styrofoam panels, gorilla glue, sandpaper, acrylic paint (red, yellow, white, black), gloss latex paint (off-white), sharpie, cardboard

I had never worked with styrofoam before and read that it could be carved with a sharp knife, acetone or a hot knife. If it were a small project, a regular soldering iron would likely have been fine, but I wanted something that would be fast and really liked that the propane torch I found had two different tips as well as an open flame option. The small hand-held propane filled torch is also easier to work with than my electric soldering iron which requires an extension cord.

Construction Details

I tested carving and painting the styrofoam very early, including figuring out how long the paint took to dry. I found that painting the white cracks before painting the red bricks made it come out a bit better. It was also good to test carving to create an effect that looks like mortar.

See key steps in photos

  1. Wooden frame: I uses 1″ thick boards for the frame. Discount Builders in San Francisco has a friendly staff and cut long 12′ or 8′ boards into the lengths I needed. The hearth is a separate component, which is like a hollow box with one side missing, where the missing side is actually attached to the larger frame, so the top can rest on it for stability and is not actually physically attached. I used a level and steel square for an even frame and wood screws.
  2. Top: I used a single 1″ thick board and drilled small holes for pegs to attach the top to the frame in a way that could be easily removed, yet fit snugly. I used garden shears to cut a long wooden dowel into 1″ lengths to create the pegs. I used an electric sander for rounded edges on three sides, and finished with fine sandpaper. I then coated with a few coats of gloss paint (which would have been easier if I had remembered primer!).
  3. Styrofoam: for the large panels I found that using small nails to hold the styrofoam in place made it easier to keep everything steady while gluing it all together. I attached all of the styrofoam panels before carving bricks with the torch. Large cuts were most effective with a mat knife.
  4. Carving the bricks: I found a few inspirational Fireplace images via Google Image Search and drew bricks with a Sharpie, then traced the lines with the torch using the soldering gun tip. Then I went back and made the brick edges a bit irregular and created a few cracks in random bricks. I also used an open flame in areas to create more texture.
  5. Painting the bricks: I mixed acrylic paint with a little dish soap and water, which makes it easier to coat the styrofoam. I noticed that most fireplaces have irregular brick colors, so I varied the colors a bit with yellow and brown highlights, as well as black “sooty” areas.
  6. The inside is painted black with a piece of cardboard as backing.

Additional Resources

I studied sculpture in college, so I had practical experience with wood construction, and learned a bit more from the amazing people who publish about their experience on the web. Here are some links to resources that I found particularly helpful.

Tutorials

Videos

by sarah at April 11, 2017 03:59 AM

April 10, 2017

Raymii.org

Run software on the tty1 console instead of getty login on Ubuntu 14.04 and 16.04

Recently I wanted to change the default login prompt on the tty1 console on an OpenStack instance to automatically run htop. Instead of logging in via the console, I wanted it to start up htop right away and nothing else. Ubuntu 14.04 uses init and Ubuntu 16.04 uses systemd. Both ways are shown in this tutorial.

April 10, 2017 12:00 AM

April 01, 2017

syslog.me

Improving your services, the DevOps way

devops-italiaOn March 10th I was in Bologna for Incontro DevOps Italia 2017, the Italian DevOps meeting organized by the great people at BioDec. The three tracks featured several talks in both Italian and English, and first-class international speakers. And, being a conference in Bologna, it also featured first-class local food that no other conference around the world will ever be able to match.

In my presentation, “Improving your services, the DevOps way”, I illustrated the methodology we used in Opera to improve our mail service with a rolling approach, where the new infrastructure was built alongside the existing one through collaboration and a mix of agile and DevOps techniques, and progressively replaced it. The slides are available on SpeakerDeck.

I understand from the organizers that the recordings will be available some time in April. However, they will be of little use for non-Italian speakers, since at the very last minute I decided to hold the speech in Italian — a choice that I deeply regretted. I have a plan to do the presentation again for the DevOps Norway meetup group, though I haven’t set a date yet. When I do, I’ll see if I can at least provide a Periscope stream.


Tagged: conference, DevOps, Opera, Sysadmin

by bronto at April 01, 2017 03:07 PM

Feeding the Cloud

Manually expanding a RAID1 array on Ubuntu

Here are the notes I took while manually expanding an non-LVM encrypted RAID1 array on an Ubuntu machine.

My original setup consisted of a 1 TB drive along with a 2 TB drive, which meant that the RAID1 array was 1 TB in size and the second drive had 1 TB of unused capacity. This is how I replaced the old 1 TB drive with a new 3 TB drive and expanded the RAID1 array to 2 TB (leaving 1 TB unused on the new 3 TB drive).

Partition the new drive

In order to partition the new 3 TB drive, I started by creating a temporary partition on the old 2 TB drive (/dev/sdc) to use up all of the capacity on that drive:

$ parted /dev/sdc
unit s
print
mkpart
print

Then I initialized the partition table and creating the EFI partition partition on the new drive (/dev/sdd):

$ parted /dev/sdd
unit s
mktable gpt
mkpart

Since I want to have the RAID1 array be as large as the smaller of the two drives, I made sure that the second partition (/home) on the new 3 TB drive had:

  • the same start position as the second partition on the old drive
  • the end position of the third partition (the temporary one I just created) on the old drive

I created the partition and flagged it as a RAID one:

mkpart
toggle 2 raid

and then deleted the temporary partition on the old 2 TB drive:

$ parted /dev/sdc
print
rm 3
print

Create a temporary RAID1 array on the new drive

With the new drive properly partitioned, I created a new RAID array for it:

mdadm /dev/md10 --create --level=1 --raid-devices=2 /dev/sdd1 missing

and added it to /etc/mdadm/mdadm.conf:

mdadm --detail --scan >> /etc/mdadm/mdadm.conf

which required manual editing of that file to remove duplicate entries.

Create the encrypted partition

With the new RAID device in place, I created the encrypted LUKS partition:

cryptsetup -h sha256 -c aes-xts-plain64 -s 512 luksFormat /dev/md10
cryptsetup luksOpen /dev/md10 chome2

I took the UUID for the temporary RAID partition:

blkid /dev/md10

and put it in /etc/crypttab as chome2.

Then, I formatted the new LUKS partition and mounted it:

mkfs.ext4 -m 0 /dev/mapper/chome2
mkdir /home2
mount /dev/mapper/chome2 /home2

Copy the data from the old drive

With the home paritions of both drives mounted, I copied the files over to the new drive:

eatmydata nice ionice -c3 rsync -axHAX --progress /home/* /home2/

making use of wrappers that preserve system reponsiveness during I/O-intensive operations.

Switch over to the new drive

After the copy, I switched over to the new drive in a step-by-step way:

  1. Changed the UUID of chome in /etc/crypttab.
  2. Changed the UUID and name of /dev/md1 in /etc/mdadm/mdadm.conf.
  3. Rebooted with both drives.
  4. Checked that the new drive was the one used in the encrypted /home mount using: df -h.

Add the old drive to the new RAID array

With all of this working, it was time to clear the mdadm superblock from the old drive:

mdadm --zero-superblock /dev/sdc1

and then change the second partition of the old drive to make it the same size as the one on the new drive:

$ parted /dev/sdc
rm 2
mkpart
toggle 2 raid
print

before adding it to the new array:

mdadm /dev/md1 -a /dev/sdc1

Rename the new array

To change the name of the new RAID array back to what it was on the old drive, I first had to stop both the old and the new RAID arrays:

umount /home
cryptsetup luksClose chome
mdadm --stop /dev/md10
mdadm --stop /dev/md1

before running this command:

mdadm --assemble /dev/md1 --name=mymachinename:1 --update=name /dev/sdd2

and updating the name in /etc/mdadm/mdadm.conf.

The last step was to regenerate the initramfs:

update-initramfs -u

before rebooting into something that looks exactly like the original RAID1 array but with twice the size.

April 01, 2017 06:00 AM

The Lone Sysadmin

Install the vCenter Server Appliance (VCSA) Without Ephemeral Port Groups

Trying to install VMware vCenter in appliance/VCSA form straight to a new ESXi host? Having a problem where it isn’t listing any networks, and it’s telling you that “Non-ephemeral distributed virtual port groups are not supported” in the little informational bubble next to it? Thinking this is Chicken & Egg 101, because you can’t have an […]

The post Install the vCenter Server Appliance (VCSA) Without Ephemeral Port Groups appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at April 01, 2017 04:07 AM