What to do with Unity 8 now

As you’re probably aware Ubuntu 16.10 was released yesterday and brings with it the Unity 8 desktop session as a preview of what’s being worked on right now and a reflection of the current state of play.

You might have already logged in and kicked the proverbial tyres.  If not I would urge you to do so.  Please take the time to install a couple of apps as laid out here:


The main driver for getting Unity 8 in to 16.10 was the chance to get it in the hands of users so we can get feedback and bug reports.  If you find something doesn’t work, please, log a bug.  We don’t monitor every forum or comments section on the web so the absolute best way to provide your feedback to people who can act on it is a bug report with clear steps on how to reproduce the issue (in the case of crashes) or an explanation of why you think a particular behaviour is wrong.  This is how you get things changed or fixed.

You can contribute to Ubuntu by simply playing with it.

Read about logging bugs in Ubuntu here: https://help.ubuntu.com/community/ReportingBugs

And when you are ready to log a bug, log it against Unity 8 here: https://bugs.launchpad.net/ubuntu/+source/unity8




Apache – 20 second lag before serving pages

TL;DR:  There is no such thing as a “none” directive in Apache 2.  If you’ve got “deny from none” or “allow from none” then you’re doing DNS lookups on each host that connects regardless of whether you want to or not.


I was experiencing a very annoying problem trying to serve static HTML pages and CGI scripts from Apache 2 recently.  The problem manifested itself like this:

  • Running the scripts on the server hosting Apache shows they ran in well under a second
  • Connecting to the Apache server from the LAN, everything was fine and ran in under a second
  • Connecting to the Apache server from the Internet, but from a machine known to my network, ran fine
  • Connecting from an AWS Lambda script, suddenly there is a 20 second or more delay before getting data back
  • Connecting from Digital Ocean, there is a 20 second delay
  • Connecting from another computer on the internet, there is a 20 second delay

What the heck is going on here?

I spent time trying to debug my CGI scripts and adding lots more logging and finally convinced myself that it was a problem with the Apache config and not something like MTUs or routing problems.

But what was causing it?  It started to feel like like a DNS related issue since the machines where it ran fine where all known to me, and so had corresponding entries in my local DNS server.  But but but… I clearly had “HostnameLookups Off” in my apache2.conf file.  When I looked at the logs again, I noticed that indeed hostnames were being looked up, even though I told it not to.


Why?  Because I don’t know how to configure Apache servers properly.  At some point in time I thought this was a good idea:

Order deny, allow
Deny from none
Allow from all

But, there is no such thing as a “none” directive.  Apache interprets “none” as a host name and so has to look it up to see if it’s supposed to be blocking it or not, which causes a DNS lookup delays and hostnames to appear in your Apache logs.

Englightenment came from here: http://kb.simplywebhosting.com/idx/6/213/article/

There is also a suggestion that inline comments can do the same thing here:  https://www.drovemebatty.com/wp/entries/11



Unity 7 Low Graphics Mode

Unity 7 has had a low graphics mode for a long time but recently we’ve been making it better.

Eleni has been making improvements to reduce the amount of visual effects that are seen while running in low graphics mode.  At a high level this includes things like:

  • Reducing the amount of animation in elements such as the window switcher, launcher and menus (in some cases down to zero)
  • Removing blur and fade in/out
  • Reducing shadows

The result of these changes will be beneficial to people running Ubuntu in a virtual machine (where hardware 3D acceleration is not available) and for remote-control of desktops with VNC, RDP etc.

Low graphics mode should enable itself when it detects certain GL features are not available (e.g. in a virtualised environment) but there are times when you might want to force it on.  Here’s how you can force low graphics mode on 16.04 LTS (Xenial) :

  1. nano ~/.config/upstart/lowgfx.conf
  2. Paste this into it:
start on starting unity7
pre-start script
    initctl set-env -g UNITY_LOW_GFX_MODE=1
end script
  1. Log out and back in

If you want to stop using low graphics comment out the initctl line by placing a ‘#’ at the start of the line.

This hack won’t work in 16.10 Yakkety because we’re moving to systemd for the user session.  I’ll write up some instructions for 16.10 once it’s available.

Here’s a quick video of some of the effects in low graphics mode:



DHCP clients not registering hostnames in DNS automatically

To remind myself as much as anything:

I run a dnsmasq server on my router (which is a Raspberry Pi 2) to handle local DNS, DNS proxying and DHCP. For some reason one of the hosts stopped registering its hostname with the DHCP server, and so I couldn’t resolve its name to an IP address from other clients on my network.

I’m pretty sure it used to work, and I’m also pretty sure I didn’t change anything – so why did it suddenly stop? My theory is that the disk on the client became corrupt and a fsck fix removed some files.

Anyway, the cause is that the DHCP client didn’t know to send it’s hostname along with the DHCP request.

This is fixed by creating (or editing) /etc/dhcp/dhclient.conf and adding this line:

send host-name = gethostname();


Online searches in the dash to be off by default.

Scopes are a leading feature of the Ubuntu Phone and of Unity 8 in general.  That concept, the story of scopes, started out in Unity 7 and in 12.10 when we added results from online searches to the dash home screen.

Well, we’re making some changes to the Unity 7 Dash searches in 16.04 LTS.  On Unity 8 the Scopes concept has evolved into something which gives the user finer control over what is searched and provides more targeted results.  This functionality cannot be added into Unity 7 and so we’ve taken the decision to gracefully retire some aspects of the Unity 7 online search features.

What is changing?

First of all online search will be off by default.  This means that out-of-the-box none of your search terms will leave your computer.  You can toggle this back on through the Security & Privacy option in System Settings.  Additionally, if you do toggle this back on then results from Amazon & Skimlinks will remain off by default.  You can toggle them back on if you wish.  Further, the following scopes will be retired from the default install and moved to the Universe repository for 16.04 LTS onwards:

    1. Audacious
    2. Clementine
    3. gmusicbrowser
    4. Gourmet
    5. Guayadeque
    6. Musique

The Music Store will be removed completely for 16.04 LTS onwards.

Why now?

By making these changes now we can better manage our development priorities, servers, network bandwidth etc throughout the LTS period. We allow ourselves more freedom to make changes without further affecting the LTS release (e.g SRUs), specifically we can better manage the eventual transition to Unity 8 and not have to maintain two sets of scope infrastructure for the duration of the LTS support period of five years.

What about previous supported releases?

Search results being off by default will not affect previous releases or upgrades, only new installs (i.e. we will not touch your existing settings).  Changes to search results from Amazon & Skimlinks will also only affect 16.04 and beyond.  The removal of the Music Store will be SRU’d back to older supported releases and the option will be removed from the Dash.

When will this happen?

We’re preparing the make the changes in the archive, to Unity 7 and to the Online Search servers right now.  This will take a little while to test and roll out.  We’ll let you know once all the changes are in Xenial.

Hacking 433Mhz support into a cheap Carbon Monoxide detector

Skill level:  Easy

My home automation systems use two mechanisms for communication:  Ethernet (both wired and wireless) and 433MHz OOK radio.

433MHz transmitters are readily available and are cheap but unreliable.  Wifi enabled MCUs such as the ESP8266 are also cheap (coming in at around the same cost as an Arduino clone, a 433MHz transmitter and a bag of bits to connect them together), they are reliable enough but extremely power hungry.  If I can plug a project into the mains then I’ll use an ESP8266 and a mobile phone charger for power, if the project needs to run off batteries then a 433MHz equipped Arduino is the way I’ve gone.

Like most people playing with 433MHz radio I found reliability and range of the radio link to be super flaky.  I’ve finally got a more-or-less reliable set-up:

  • A full wave dipole antenna at the receiver
  • A high quality receiver from RF Solutions in place of the cheap ones which are bundled with transmitters. A decent receiver on eBay
  • A big capacitor on the transmitter.  I saw the frequency and amplitude drifting massively during transmission.  Adding a 470µF cap helps.  Allow time for the cap to charge and the oscillator to stabilise, a few seconds delay seemed to do the trick.
  • Using the RCSwitch library on the transmitter:
    • RCSwitch mySwitch = RCSwitch();
    • mySwitch.setProtocol(2); // Much longer pulse lengths = much better range?
    • mySwitch.setRepeatTransmit(20); // Just brute-force it!

With this setup I can get receive a 24bit number from an Arduino running off 2 AA batteries and a coiled 1/2 wave antenna from about 5 meters indoors through walls.  That’s still poor, but it does the job.  Increasing the voltage to the transmitter would probably help.

Once you have a reliable 433MHz receiver setup then you can also buy off the shelf 433MHz enabled home automation gizmos like this smoke alarm or these door sensors.  They have a set of jumpers inside where you can set an ID, which is essentially the same 24bit number that RCSwitch lets you transmit.  For what it’s worth I also have kite-marked smoke detectors in my house, but from the testing I’ve done with a bit of smoldering paper the cheap imports work just fine.

I couldn’t find a cheap Carbon Monoxide which also has 433MHz support so I thought I’d quickly hack one together out of this Carbon Monoxide detector and an Arduino clone and 433MHz radio:

CO Alarm inside











You can barely notice it!








It’s certainly untidy, but it does the job.  If I had PCB facilities at home I’m fairly sure it could be made to fit inside the alarm, along with some more holes in the case for ventilation.

The premise is simple enough.  The Arduino is powered by the 3v3 regulator on the CO alarm PCB.  The cathode of the red alarm LED is connected to pin 2 of the Arduino as an external interrupt.  When the pin goes low the Arduino wakes up and sends it’s 24bit ID number over the radio which is picked up by the receiver which sends an SMS alert, switches the boiler off, etc.  I’ve connected the radio transmitter to directly to the 3 x AA batteries (4.5 volts) via a transistor which is switched by a pin on the Arduino.  In standy-by mode the additional equipment draws a fraction of a milliamp and so I’m not worried about draining the batteries faster.

As with the smoke alarms, this is not my only source of Carbon Monoxide detection.  I’ve yet to test it’s sensitivity.  This is considered to be a “well, if it works, and it turns the boiler off automatically then it’s certainly worth a go, but I’m not relying on it” project.

10 years with Ubuntu


Today I have had a Launchpad account for ten years!

I got started out on this road around 1992.  I remember the day Stuart got a PC and installed Minix on it.  That box was biege, naturally, was about 3 feet square and constructed from inch thick iron plate.  Minix was totally alien when compared to the Acorn MOS and RISCOS powered machines I’d used until then, and absolutely intriguing.

A few years later at university I encountered VAX/VMS and Sun SPARCstations and The Internet and Surfers and Mozilla and a Gopher connected Coke machine.

Then out into the big wide world of work and run-ins with AS400 and RS/6000s running AIX.  During this time I started seeing more and more Red Hat in places where there once would have been the more established players, providing email and web servers.  The fascination with *nix was always there and I started using Red Hat at home for fun.

I quickly ran into frustrations with RPMs and Stuart, always a source of wisdom, suggested I try Debian.

Dpkg made my life a whole lot easier and I started using Debian as my default OS for everything. Pretty soon after that I found myself compiling kernels, modules and software packages because I needed or wanted something in a newer version.  Coupled with the availability of cheap unbranded webcams, sound cards, network cards, TV cards etc and a strong desire to make these things work with Linux meant that I had found a wonderful way to stay up until 4 in the morning getting more and more frustrated.  The phrase “I’m going home to play with the kernel” was frequently questioned by my boss Jeremy.  I wanted these things to work but was endlessly faffing about trying to make it happen.

Better call Stuart.

“You should try this new Debian based distribution called Ubuntu” he said.

So I did, and it just worked.  A box fresh kernel with all the goodies I needed already compiled in and an up-to-date GNOME desktop (I’d set my allegiances before trying Ubuntu so this was another tick in the box), not forgetting one of the brownest themes known to man.

And that was that.  Ubuntu worked for me and I was immediately a fan.

And here I am today, 10 years later, still running Ubuntu.  My servers run Ubuntu, all the desktops in my house run Ubuntu, I have an Ubuntu powered phone and soon I’ll have an Ubuntu powered Mycroft with which I’ll be able to control my Ubuntu powered things while wearing my Ubuntu T shirt and drinking tea (should that be kool-aid?) from my Ubuntu mug.

I salute my Ubuntu brothers and sisters.  Thanks for making all of this possible.





Big Bug Bonanza Ubuntu 16.04 LTS

The vast majority of Ubuntu desktop users prefer to stick with a long term support release (https://wiki.ubuntu.com/LTS) rather than the regular 6 monthly releases, so 16.04 LTS represents the next big upgrade for most Ubuntu users.  16.04 LTS will be running Unity 7 by default as it has done for the last six years and our focus for the Unity 7 stack is fixing bugs which adversely affect the user experience of the desktop.

Over the years the bug lists for Unity 7 and Compiz have grown to become unmanageable.  To make sure we are focusing on the most important issues we have to do some serious tidying up of the bug lists and we need some help.

At the time of writing there are 2680 open bugs for Unity 7 (https://launchpad.net/ubuntu/+source/unity/+bugs) and 1455 for Compiz (https://launchpad.net/ubuntu/+source/compiz/+bugs) and 322 for nux, our graphical toolkit (https://launchpad.net/ubuntu/+source/nux/+bugs).

We’re proposing to cut this down to size with the following plan:

  1. Close all bugs which relate to an unsupported release of Ubuntu.  We will do a manual review of the high heat bugs affecting unsupported releases first, but low heat bugs will most likely be closed by a robot.  The rationale is that the majority of these older bugs will have been fixed and that the original reporter is probably no longer affected by the bug and has forgotten to close it.  Plus manually screening each of these bugs cannot be done at this scale in a reasonable timeframe.  There will be some collateral damage which is an unfortunate but unavoidable side-effect.  Sorry if this affects you, but please do re-open the bug against a supported release.
  2. Close all private apport bugs and review public ones with a view to closing them as well.  Apport is the automated error reporting tool which runs when it detects a crash.  It can open a private bug in Launchpad, private because stack traces might contain sensitive information which shouldn’t be public.  We have errors.ubuntu.com which can monitor crashes and provide a much clearer picture of which crashers are affecting numerous people and which are one-offs.  We will use errors.ubuntu.com instead of trying to triage the 250 or so bugs which fall in to this category.
  3. Manually try to reproduce bugs and flag those which are still a problem.  This is where we need the most help.  We will create a list of bugs which need to be checked and then ask people to spend a few minutes trying to reproduce a chosen bug on 15.10.  If it’s still a problem then the tester would mark the bug as triaged or add a specific tag, or if it cannot be reproduced then they would mark the bug as Invalid.  This will give us a curated list of real bugs which we can then triage further to assess the impact and priority.  We will work through the triaged list in an agile manner and have regular meetings to review what has been fixed and decide on which bugs to focus on next.  By distributing this problem across many people we can get the job done in a reasonable time scale.

How you can help

First of all we need help in triaging the bug list.  You don’t need to be a superstar software developer to do this, everyone can help and contribute to Ubuntu.  You will need a Launchpad account though.  We will publish a link to a list of bugs in Launchpad for Unity 7 (and in time Compiz & Nux) which we think need manual checking.  The links are available at this wiki page:  https://wiki.ubuntu.com/BigDesktopBugScrub

Please choose a bug from this list and try to recreate it in 15.10.  If your main machine isn’t running 15.10 you could set up a virtual machine using VirtualBox.

  1. Choose a bug from the list.  The heat metric is a good indication of which bugs are more important to a lot of people.  The list is sorted by heat so selecting one from somewhere near the top is a good starting point.  It’s possible that someone else will be working on the same bug as you so check the comments to see if anyone has added anything recently.
  2. Can you recreate the bug?  There are a number of possible outcomes when you attempt to recreate the bug.  Listed below are the most common ones.  If you can’t match one of these categories directly, or don’t know what to do just leave the bug where it is and try a different one.
    1. No – I can’t understand from the report what the problem is:
      1. Add a comment along the lines of:  “Thank you for taking the time to report this bug.  Unfortunately we can’t work out how to recreate this bug from your description.  Please describe the process you go through to trigger this bug and then change the bug status to NEW.  See this page for more information. https://wiki.ubuntu.com/BigDesktopBugScrub”
      2. Set the bug status to Incomplete
    2. No – I’ve tried to but it doesn’t seem to be a problem any more:
      1. Add a comment along the lines of: “Thank you for taking the time to report this bug.  We have tried to recreate this on the latest release of Ubuntu and cannot reproduce it.  This bug is being marked as Invalid.  If you believe the problem to still exist in the latest version of Ubuntu please comment on why that is the case and change the bug status to NEW.”
      2. Set the bug status to Invalid
    3. Yes – it’s still a problem in 15.10:
      1. Add a comment along the lines of: “As part of the big bug review for 16.04 LTS I have tested this on 15.10 and the bug is still there.”
      2. Mark the bug as Triaged or, if you don’t have permission to do that, add the tag “desktop-bugscrub-triaged”
    4. Yes – but I don’t think it’s really a bug (perhaps a feature request):
      1. Add a comment along the lines of: “As part of the big bug review for 16.04 LTS I have tested this on 15.10 and the bug is still there.  I think this is a feature request rather than a bug.”
      2. Mark the bug as “Opinion”, or if you don’t have permission to do that, add the tag “desktop-bugscrub-opinion”
  3. Thank you!  We’re one bug closer to perfection!
  4. Lather, Rinse, Repeat


What happens next

Once we have a list of high quality, reproducible bug reports which are affecting many people we can start to chip away at them in a logical manner.  We will be using an Agile-like workflow:

  1. Meet at the start of a “sprint” to discuss which of the most important bugs (importance will be decided on a mixture of bug heat and expert knowledge) will be working on during the sprint duration.  We will decide how many of the bugs we think are fixable in that sprint and take them into our backlog.  The backlog will be managed using Trello (https://trello.com/b/9YvUSYqq/unity-7).
  2. The sprint will start and developers will take bugs (Cards) from the backlog to work on.
  3. The card will move to the In Progress colum
  4. If there is a problem the bug will move to the blocked column and these cards will be discussed at regular intervals during the sprint.
  5. Once a bug is fixed it will move to the Review column.  A code review will be done and if everything is OK then the fix will be merged and automatically tested.  If there are problems it will move back to the In Progress column.
  6. At the end of the sprint the fixes will be demonstrated and everyone will have a chance to spot any problems with the fix.  If there is a problem the card will go back into the Backlog for more work next sprint.  If everything is OK then the card is moved to Done and that bug is now fixed.
  7. The next sprint will start and we will go back to step 1.


We will endeavour to do our reviews in a Hangout On Air so that everyone can join to see what progress is being made.  We will also use our IRC channel on Freenode #ubuntu-desktop.


Software developers who want to help

If you are a developer who wants to help fix the code as well as triage bugs please join us on IRC (#ubuntu-desktop on Freenode) and introduce yourself.  We can get you write access to the Trello board and invite you along to the Sprint planning and review meetings.  We’d love you to get involved.

Bug Squash Hours

In order to kick start the process we will be setting aside a few hours a week where a core Unity 7 developer will be available on IRC to help answer questions about bugs and we’ll be working through the list as well.  Feel free to ask for help or come and join us while we work through the bug list.  Exact schedule will be announced as soon as we know what it is.


Ubuntu Online Summit

We will have a session at UOS to review how the bug triage is going, discuss our tooling and policy on which bugs to auto-close etc.

HOWTO: Very low power usage on Pro Mini V2 (Arduino clone)

Skill level:  Easy enough if you’ve got a soldering iron.


The Pro Mini V2 is an Arduino Pro Mini clone available on eBay for, typically, £1.50.  The version I buy is adjustable between 5v and 3.3v and has an ATmega 328 clocked at 8 MHz.  It’s an ideal board for development of IoT remote sensors and great for playing with and learning about the Arduino development environment.

Here’s a link to the version I buy and know works: 3V Pro Mini 2 Arduino Clone

When you want to put a sensor in a remote location the last thing you want to do is have to run a power cable to it. I’ve experimented with solar with generally poor results so battery operation is the obvious solution.  While Li-ion batteries offer higher energy density the sweet spot still seems to be the good ol’ alkaline battery.  They’re cheap, safe, recyclable and readily available.

For what it’s worth the Ikea alkaline batteries offer good value:  http://www.batteryshowdown.com/results-lo.html (I suggest buying as many packets as you can carry, so that you never ever have to go back there.  Unless you like arguing with your wife of course.)

Power Usage

The ATmega 328 has various power saving functions which involve putting it to sleep when not doing anything.  I use the Rocket Scream Low Power library to take care of putting the processor into a low power state, but I wasn’t seeing anything like the low power savings they detail on their site.

Some quick calculations:  Let’s assume a AA battery provides 2000 mAh.  I measured my Pro Mini V2 as drawing 6.7mA when powered up and doing things and 2.8mA when in sleep mode.  As a conservative estimate, let’s say it’s running for 1 hour in every 24 hour period and asleep the rest of the time.  That averages out to about 3mAh of draw.

For a 2000 mAh battery, that would give about 667 hours of runtime, or 28 days. So a standard Pro Mini V2 could run for about a month on a pair of AA batteries.  Not bad, but I think changing the batteries every month is still going to be a bit of a drag. Besides, Rocket Scream are seeing power usages in the micro-amps range when asleep.  There is clearly work to do.

How to dramatically reduce the power consumption

In this photo you can see I’ve identified some sections which are related to the power usage of the Pro Mini.

arduino_highlightsThe red section is the power LED.  This is always on when power is applied and sucks about 0.2mA when lit. If you don’t need this to be lit all the time then you can easily remove it to save some juice.  I found the easiest way was to use a pair of cutters to snip/crush the middle of the LED and then use a soldering iron to remove the bits left over.

The green section is the on-board regulator.  If you are going to be supplying power to the board via a couple of AA batteries (each battery being 1.5v, so two is 3v) then you don’t need the regulator.  You can cut this off too if you like, but.. keep reading, there’s no need to hack it off.

Saving the best until last, the yellow section is the power-selection jumper to switch between 3.3v and 5v.  It passes the power supplied by the RAW pin through the regulator and on to the board.  The regulator is inherently inefficient.  You might think that you could bypass the on-board regulator by powering the board by apply power to the Vcc pin instead, but it still seems to power the regulator.  By simply unsoldering this jumper you can disable the on-board regulator and save loads of power.  Once removed you will need to apply power to the Vcc pin at ~ 3.3V.  I used some solder-wick to clean up but you could just scrape it off with a soldering iron if you need to.

Here’s one I prepared earlier.


With the jumper and LED removed. (Red and yellow boxes from previous image)


With the LED and solder jumper removed I measured the power usage again.  Running current is now down to 3.8mA, pretty much half of what it was.  But, most impressively the power used when asleep is down to 0.004mA.  4 microamps! Yay!

Some more quick calculations based on the same usage as before:  average power consumption drops to 0.17mA.  That gives us 490 days, 1.3 years run time off a pair of AA batteries.  That should allow for 2.5 minutes “work” an hour.  Waking up, taking some readings and sending them off via a radio should take well under 1 minute, which should allow for more power usage by a radio.





Unless you’re going to run your Arduino off a permanently attached serial connector, then just do this.  Get yourself a couple of AA batteries & a battery holder.  Apply the +ve side of the batteries to Vcc and the -ve to ground.  Stick your multi-meter in between the battery and Vcc pin to measure the lovely low current usage.  You can read the battery power being provided with the Secret Arduino Volt-meter trick.

I’ve got quite a few sensors around the house running with this set-up so I will monitor battery usage over the next few weeks or months and report back.


Coming Soon…

A write up of my cheapo IoT sensor network, including smoke detectors, door contact sensors, movement sensors, house-plant watering monitors, room temperature sensors and a weather station.  Plus, build an IoT sensor and buy a sausage roll for less than a fiver.


Multipath routing on a Raspberry Pi 2

Skill level:  Not for the faint hearted!

A few years ago, when I started working at home, I had a second ADSL line installed so that I could still get online if my ISP had an outage.  As well as fault tolerance I wanted to try and use all the available bandwidth rather than just have it sitting there “just in case”.  I achieved this using multi path routing and documented the solution here:  Over Engineering FTW.

This has been running really well on a Raspberry Pi for about 3 years (with an older kernel, see later in this post for why) but recently the SD card has started to fail.  Although this would be easy to fix; simply replace the SD card and copy my scripts over, the rural town I live in has just been upgraded to FTTC and so my connection speed has gone from about 8 Mbps to about 70 Mbps on each line.  The first generation Pi doesn’t have enough horsepower to cope with 70 Mbps let alone 140Mbps, and indeed the ethernet interface is only 100Mbps.  I had a Raspberry Pi 2 spare anyway so I figured I would use that and add a second gigabit NIC so I could cope with the theoretical 140 Mbps connection to the internet, and since I had two NICs I might as well use both of them.

Physical layout

This is what I came up with:

New network config










  • Two lines coming from the cabinet to my house, one with Plusnet and one with TalkTalk
  • The Plusnet line:
    • It came with an OpenReach vDSL bridge and a crappy locked down router, so I chucked the router away and used PPPoE tools to bring up the PPP connection
    • The vDSL bridge talks to the Raspberry Pi over a VLAN to keep it separated from the other noise on the switch
    • Interface eth1.1000 is an unnumbered interface and ppoeconf uses a layer 2 discovery protocol to find the bridge
    • Once the PPP connection is established ppp1 can be used to route traffic to the internet
  • The TalkTalk line:
    • It too came with a crappy router, but no OpenReach bridge.  So I had to use it.
    • The TalkTalk router talks to the Raspberry Pi over VLAN 10.  Those ports are untagged on the switch, so as far as everyone on that network knows its just a self contained LAN.
    • Interface eth0 on the Raspberry Pi has an address on that LAN and uses the TalkTalk router to talk to the internet
  • The main LAN:
    • Interface eth1 is used to connect to the main LAN
    • Clients on the LAN use the Raspberry Pi as their default gateway

With me so far?  Essentially we have the normal eth0 interface of the Pi connected to one LAN with its own router and eth1 (a USB gigabit ethernet adapter) has a tagged VLAN for connection to the OpenReach bridge (eth1.1000) and an untagged default network for connecting the the main LAN.  Once the layer 2 connection with the bridge is established a PPP connection becomes the second route to the internet.

The death of route caching

Around version 3.6 of the Linux kernel “route caching” was removed.  With route caching in place you could set up a default route with multiple hops, something along the lines of:

ip route add default nexthop via dev eth0 nexthop via dev eth1

When a packet needed routing to the internet the kernel would do a round-robin selection of which route to use and then remember that route for a period of time.  The upshot of this was, for example, that if you connected to www.bbc.co.uk and got routed first via and so SNATed to then all subsequent traffic for that destination also got routed via the same route and had the same source IP address.  Without route caching the next packet to that same destination would (probably) use the other route, and in the case of my home user scenario would arrive from a different source IP address – my two internet connections having different IP addresses.  Although HTTP is a connectionless protocol this change of IP address did seem to freak some services out.  For protocols with connections the story is worse, e.g. packets of an SSH connection would arrive at the far end from from two different IP addresses and probably get dropped.  Route caching was a simple fix for this issue and worked well, as far as I was concerned anyway.

Im sure the reasons to remove it are valid, but for my simple use case it worked very well and the alternative, and now only option is to use connection marking to simulate the route caching.  When I first looked at it I was baffled and thought I would just go back to a pre 3.6 kernel and use route caching again.  But, in the standard Raspbian distro there isn’t a kernel old enough for the Raspberry Pi 2 to make use of it.

So I was stuck…  I had to use a Raspberry Pi 2 to get enough packet throughput to max out my internet connections, and I couldn’t use route caching because there wasn’t a kernel old enough.  This meant I was going to have to either compile my own kernel or learn to use connection marking.  Joy.

Alternative projects

The documentation for Netfilter is extensive but I found a lot of it to be out of date and very hard to grok.  I found a few projects who had already implemented connection tracking/marking namely FWGuardian and Fault Tolerant Router.

FWGuardian is, as far as I can tell, designed for something orthogonal to my set up.  Where you might have lots of connections coming in to a server, or a number of offices which need to connect to other offices via pre-defined routes.  I played around with it for a while, and Humberto very kindly offered me support over email, but ultimately it was too involved and complex for my needs.  You should check out the project though if you have advanced requirements.  It’s got some brilliant features for a more enterprise oriented setup.

Fault Tolerant Router is a much simpler setup and matched my requirements very closely.  At it’s core it’s a Ruby script which can write your iptables rules and routing tables and constantly monitor the links.  If one goes down it can dynamically rewrite your rules and direct all traffic down the working connection.  However, it’s not expecting to use a PPP connection where gateways can change and it’s not really been tested with VLANs, although in practice it handled VLANs just fine.

But, at the end of the day, I wanted to learn how to do this myself and so I used the rules generated by Fault Tolerant Router to understand how connection marking was supposed to work and then started to implement my own home-grown solution for teh lolz.

Multi-path routing and connection marking

As I understand it, the idea with connection marking, or connection tracking – I’m not sure what the difference is, is that when a new conversation starts the packets are marked with an identifier.  You can then set ip rules to dictate which route packets with a particular mark take.  In essence once a new connection is established and a route selected, all other packets in that conversation take on the same mark and so the same route.  This emulates the route caching of the past.  I don’t really get how, in the case of an HTTP conversation (or flow) which is connectionless, all the packets in the conversation get marked the same.  This page has some more details, but I haven’t read it properly yet.  Anyway, we don’t know HOW it works, but it does.  Good enough.


First of all we need to create the iptables configuration to set up connection marking.  Here’s the relevant extract from the iptables.save file:

 [0:0] -A PREROUTING -i eth1 -j CONNMARK --restore-mark
 [0:0] -A PREROUTING -i ppp1 -m conntrack --ctstate NEW -j CONNMARK --set-mark 1
 [0:0] -A PREROUTING -i eth0 -m conntrack --ctstate NEW -j CONNMARK --set-mark 2
 [0:0] -A POSTROUTING -o ppp1 -m conntrack --ctstate NEW -j CONNMARK --set-mark 1
 [0:0] -A POSTROUTING -o eth0 -m conntrack --ctstate NEW -j CONNMARK --set-mark 2

-i = –in-interface and -0 = –out-interface

These rules set a mark depending on which interface is used.  These changes happen in the mangle table.

Packets going in or out the WAN via ppp1 or eth0 which are a new connection are marked with a 1 or a 2 depending on which interface they use.  The decision about which route to use is done in the rules which we will see later.  Any packets coming in to eth1, so from the LAN, have their marks restored on the way in so they can be dealt with accordingly.

Now let’s have a look at the filter table:

 :INPUT DROP [0:0]
 :LAN_WAN - [0:0]
 :WAN_LAN - [0:0]
[0:0] -A INPUT -i lo -j ACCEPT
 [0:0] -A INPUT -i eth1 -j ACCEPT
 [0:0] -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
[0:0] -A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
 [0:0] -A FORWARD -i eth1 -o ppp1 -j LAN_WAN
 [0:0] -A FORWARD -i eth1 -o eth0 -j LAN_WAN
 [0:0] -A FORWARD -i ppp1 -o eth1 -j WAN_LAN
 [0:0] -A FORWARD -i eth0 -o eth1 -j WAN_LAN
## Clamp MSS (ideal for PPPoE connections)
 [0:0] -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
 [0:0] -A LAN_WAN -j ACCEPT
 [0:0] -A WAN_LAN -j REJECT

The default policy is set to DROP, so any packet not matching one of the rules are dropped.

INPUT applies to packets which are bound for the router itself.  Packets from the local interface are allowed, and packets from eth1 (the main LAN) are also allowed.

FORWARD applies to packets which are passing through the router on their way somewhere else.  Packets which are known to be part of an already in-progress session are allowed.  Packets are then categorised as LAN to WAN or WAN to LAN and dealt with by the rules LAN_WAN or WAN_LAN, getting accepted and rejected respectively.  All this boils down to LAN clients using the Raspberry Pi as a router and so having their packets forwarded are allowed out and packets coming in from the internet are rejected, the exception being if they are part of an on-going connection.

Clamping MSS to MTU deals with a particular issue with using PPPoE connections where the MTU can’t be the usual 1500 bytes.  Because a lot of ISPs block the ICMP messages that would normally deal with asking the client to send smaller packet sizes we use this handy trick to make sure that packets can go out unfragmented.  If you find that some web pages are slow to load and others are not, then try switching this on.  If you’re only using upstream ISP provided routers you probably don’t need this.

Lastly in iptables we enable SNAT or masquerading so that connections out to the internet appear to come from a valid internet routable IP address not our LAN IP address:

 [0:0] -A POSTROUTING -o ppp1 -j SNAT --to-source
 [0:0] -A POSTROUTING -o eth0 -j SNAT --to-source

Routing tables

We’ve configured iptables to add a mark to traffic depending on which WAN interface it is going in or out of.  But this is only marking the packets, there is no logic to make sure that packets of the same mark use the same route.  To make this happen we use ip rules.

First create three new routing tables by editing /etc/iproute2/rt_tables.  I’ve added this to the bottom:

1 plusnet
 2 talktalk
 3 loadbal

Now we add a default route to the first two of those tables:

ip route add default via $PPP_GATEWAY_ADDRESS dev ppp1 src table plusnet
ip route add default via dev eth0 src table talktalk

$PPP_GATEWAY_ADDRESS is set when the PPP session is established and changes.  We can look at ways to find that address later, but for now just substitute the “P-t-P” IP address from “ifconfig ppp1” or whatever your ppp interface number is, or in the case of an ISP-provided router, the LAN side IP of that router.

This is simply creating a routing table with the name of the ISP that will be used and a default route which can find its way to the internet for that ISP.

Next we create the loadbal routing table which is a combination of the previous two:

ip route add default table loadbal nexthop via $PPP_GATEWAY_ADDRESS dev ppp1 nexthop via dev eth0

which is the same idea as we used in the old route caching days, a round-robin route which flicks between the two available routes to the internet.

ip rules

We’ve now created the iptables entries to track and mark traffic from each of the two ISPs and add some basic firewalling and IP masquerading.  We’ve also created a routing table for each ISP and a load-balancing table which splits the traffic between the two ISPs.

Now we need to create some rules to govern which of the routing tables is used for a particular connection.  The commands to do this are:

ip rule add from $PPP_IPADDR table plusnet pref 40000
ip rule add from table talktalk pref 40100
ip rule add fwmark 0x1 table plusnet pref 40200
ip rule add fwmark 0x2 table talktalk pref 40300
ip rule add from 0/0 table loadbal pref 40400

The rules are matched in numerical order based on preference and once a rule matches that’s it.  The first two rules make sure that traffic from the routers uses the correct table.

The important rules are the last three.  Traffic which has been marked “1” will always use the plusnet routing table, traffic marked as “2” will always use the talktalk routing table.  This ensures that all traffic which is part of an on-going conversation will always use the same router out to the internet, and so always come from the same IP address.

The last rule only matches traffic which is not already marked i.e. new conversations.  This routing table, as can be seen in the previous section, has a multi-path route to balance traffic between the two routes out.  Once a conversation is established the IPtables conntrack rules will mark the traffic and so one of the two fwmark rules will match.

Now delete the main default route so that the above rules don’t get bypassed with a route in the “main” table:

ip route del default

And that’s it.  You should now have a router which splits the traffic fairly evenly across two internet connections and keeps tabs on which packets should go out of which routers.  I’ve had this running for a month or so now, and it seems to be working fine.  I’ve had the Pi lock up a couple of times, but I think that’s related to the USB gigabit ethernet adapter.

Smart Netflix hacks

Services such as unblock-us allow you to work around some geographic content blocks by acting as your DNS server and replying with the IP address of, say, the US based Netflix server instead of the UK ones.  I’ve installed dnsmasq on my Pi as well and configured it to use the Unblock DNS servers instead of my ISP or Google servers.  The clients on the LAN get their network configuration over DHCP from the Pi which sets the DNS server address for the clients to the Pi itself which then handles DNS lookups using the Unblock servers upstream.  This works really well for most Netflix clients but I was having a lot of problems getting the Chromecast to work with Netflix and Unblock US.

It turns out that Google have hard-coded it’s own DNS servers into the Chromecast and so your local DNS settings are ignored.  Nice one Google.

Because we’re using a Linux box as our router we can do this:

iptables -t nat -A PREROUTING -s <Netflix Client IP>/32 -d -p udp --dport 53 -j DNAT --to <Alternative DNS Server IP Address>
 iptables -t nat -A PREROUTING -s <Netflix Client IP>/32 -d -p udp --dport 53 -j DNAT --to <Alternative DNS Server IP Address>

Using the NAT table we rewrite the DNS lookup bound for Google’s DNS servers to send it to our dnsmasq server instead. lol.

Spreading interrupts across cores

Network cards have queues for tx and rx.  Higher end cards will typically have more queues, but on the Pi the on-board NIC (which is actually connected via USB) has one for tx and one for rx, as do the VLAN interfaces and the PPP interfaces.  Each of these queues has a CPU affinity and it seems that by default the queues all use the same CPU core.

When downloading an ISO with BitTorrent and the load-balancing set up I was able to achieve just over 10 MBytes a second.  But the Pi became really unresponsive.  Looking at top showed one CPU core maxed out in soft interrupts:






By adjusting the CPU affinity to spread these IRQs across multiple CPUs I squeeze out a tiny bit more network throughput, but more usefully the Pi remained responsive under heavy load:







The commands I used to do this are:

echo 1 > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo 1 > /sys/class/net/eth0/queues/tx-0/xps_cpus
echo 2 > /sys/class/net/eth1/queues/tx-0/xps_cpus
echo 2 > /sys/class/net/eth1/queues/rx-0/rps_cpus
echo 4 > /sys/class/net/eth1.1000/queues/tx-0/xps_cpus
echo 4 > /sys/class/net/eth1.1000/queues/rx-0/rps_cpus
echo 8 > /sys/class/net/ppp1/queues/tx-0/xps_cpus
echo 8 > /sys/class/net/ppp1/queues/rx-0/rps_cpus


Here’s a tgz file containing my iptables rules and a script to set up the above: routing

Update:  I’ve put the files in this Github repo:  https://github.com/8none1/multipathrouting

If you’re interested in helping to make the scripts a bit more generic and adding fault-tolerance let me know.