Big Bug Bonanza Ubuntu 16.04 LTS

The vast majority of Ubuntu desktop users prefer to stick with a long term support release (https://wiki.ubuntu.com/LTS) rather than the regular 6 monthly releases, so 16.04 LTS represents the next big upgrade for most Ubuntu users.  16.04 LTS will be running Unity 7 by default as it has done for the last six years and our focus for the Unity 7 stack is fixing bugs which adversely affect the user experience of the desktop.

Over the years the bug lists for Unity 7 and Compiz have grown to become unmanageable.  To make sure we are focusing on the most important issues we have to do some serious tidying up of the bug lists and we need some help.

At the time of writing there are 2680 open bugs for Unity 7 (https://launchpad.net/ubuntu/+source/unity/+bugs) and 1455 for Compiz (https://launchpad.net/ubuntu/+source/compiz/+bugs) and 322 for nux, our graphical toolkit (https://launchpad.net/ubuntu/+source/nux/+bugs).

We’re proposing to cut this down to size with the following plan:

  1. Close all bugs which relate to an unsupported release of Ubuntu.  We will do a manual review of the high heat bugs affecting unsupported releases first, but low heat bugs will most likely be closed by a robot.  The rationale is that the majority of these older bugs will have been fixed and that the original reporter is probably no longer affected by the bug and has forgotten to close it.  Plus manually screening each of these bugs cannot be done at this scale in a reasonable timeframe.  There will be some collateral damage which is an unfortunate but unavoidable side-effect.  Sorry if this affects you, but please do re-open the bug against a supported release.
  2. Close all private apport bugs and review public ones with a view to closing them as well.  Apport is the automated error reporting tool which runs when it detects a crash.  It can open a private bug in Launchpad, private because stack traces might contain sensitive information which shouldn’t be public.  We have errors.ubuntu.com which can monitor crashes and provide a much clearer picture of which crashers are affecting numerous people and which are one-offs.  We will use errors.ubuntu.com instead of trying to triage the 250 or so bugs which fall in to this category.
  3. Manually try to reproduce bugs and flag those which are still a problem.  This is where we need the most help.  We will create a list of bugs which need to be checked and then ask people to spend a few minutes trying to reproduce a chosen bug on 15.10.  If it’s still a problem then the tester would mark the bug as triaged or add a specific tag, or if it cannot be reproduced then they would mark the bug as Invalid.  This will give us a curated list of real bugs which we can then triage further to assess the impact and priority.  We will work through the triaged list in an agile manner and have regular meetings to review what has been fixed and decide on which bugs to focus on next.  By distributing this problem across many people we can get the job done in a reasonable time scale.

How you can help

First of all we need help in triaging the bug list.  You don’t need to be a superstar software developer to do this, everyone can help and contribute to Ubuntu.  You will need a Launchpad account though.  We will publish a link to a list of bugs in Launchpad for Unity 7 (and in time Compiz & Nux) which we think need manual checking.  The links are available at this wiki page:  https://wiki.ubuntu.com/BigDesktopBugScrub

Please choose a bug from this list and try to recreate it in 15.10.  If your main machine isn’t running 15.10 you could set up a virtual machine using VirtualBox.

  1. Choose a bug from the list.  The heat metric is a good indication of which bugs are more important to a lot of people.  The list is sorted by heat so selecting one from somewhere near the top is a good starting point.  It’s possible that someone else will be working on the same bug as you so check the comments to see if anyone has added anything recently.
  2. Can you recreate the bug?  There are a number of possible outcomes when you attempt to recreate the bug.  Listed below are the most common ones.  If you can’t match one of these categories directly, or don’t know what to do just leave the bug where it is and try a different one.
    1. No – I can’t understand from the report what the problem is:
      1. Add a comment along the lines of:  “Thank you for taking the time to report this bug.  Unfortunately we can’t work out how to recreate this bug from your description.  Please describe the process you go through to trigger this bug and then change the bug status to NEW.  See this page for more information. https://wiki.ubuntu.com/BigDesktopBugScrub”
      2. Set the bug status to Incomplete
    2. No – I’ve tried to but it doesn’t seem to be a problem any more:
      1. Add a comment along the lines of: “Thank you for taking the time to report this bug.  We have tried to recreate this on the latest release of Ubuntu and cannot reproduce it.  This bug is being marked as Invalid.  If you believe the problem to still exist in the latest version of Ubuntu please comment on why that is the case and change the bug status to NEW.”
      2. Set the bug status to Invalid
    3. Yes – it’s still a problem in 15.10:
      1. Add a comment along the lines of: “As part of the big bug review for 16.04 LTS I have tested this on 15.10 and the bug is still there.”
      2. Mark the bug as Triaged or, if you don’t have permission to do that, add the tag “desktop-bugscrub-triaged”
    4. Yes – but I don’t think it’s really a bug (perhaps a feature request):
      1. Add a comment along the lines of: “As part of the big bug review for 16.04 LTS I have tested this on 15.10 and the bug is still there.  I think this is a feature request rather than a bug.”
      2. Mark the bug as “Opinion”, or if you don’t have permission to do that, add the tag “desktop-bugscrub-opinion”
  3. Thank you!  We’re one bug closer to perfection!
  4. Lather, Rinse, Repeat

 

What happens next

Once we have a list of high quality, reproducible bug reports which are affecting many people we can start to chip away at them in a logical manner.  We will be using an Agile-like workflow:

  1. Meet at the start of a “sprint” to discuss which of the most important bugs (importance will be decided on a mixture of bug heat and expert knowledge) will be working on during the sprint duration.  We will decide how many of the bugs we think are fixable in that sprint and take them into our backlog.  The backlog will be managed using Trello (https://trello.com/b/9YvUSYqq/unity-7).
  2. The sprint will start and developers will take bugs (Cards) from the backlog to work on.
  3. The card will move to the In Progress colum
  4. If there is a problem the bug will move to the blocked column and these cards will be discussed at regular intervals during the sprint.
  5. Once a bug is fixed it will move to the Review column.  A code review will be done and if everything is OK then the fix will be merged and automatically tested.  If there are problems it will move back to the In Progress column.
  6. At the end of the sprint the fixes will be demonstrated and everyone will have a chance to spot any problems with the fix.  If there is a problem the card will go back into the Backlog for more work next sprint.  If everything is OK then the card is moved to Done and that bug is now fixed.
  7. The next sprint will start and we will go back to step 1.

 

We will endeavour to do our reviews in a Hangout On Air so that everyone can join to see what progress is being made.  We will also use our IRC channel on Freenode #ubuntu-desktop.

 

Software developers who want to help

If you are a developer who wants to help fix the code as well as triage bugs please join us on IRC (#ubuntu-desktop on Freenode) and introduce yourself.  We can get you write access to the Trello board and invite you along to the Sprint planning and review meetings.  We’d love you to get involved.

Bug Squash Hours

In order to kick start the process we will be setting aside a few hours a week where a core Unity 7 developer will be available on IRC to help answer questions about bugs and we’ll be working through the list as well.  Feel free to ask for help or come and join us while we work through the bug list.  Exact schedule will be announced as soon as we know what it is.

 

Ubuntu Online Summit

We will have a session at UOS to review how the bug triage is going, discuss our tooling and policy on which bugs to auto-close etc.

Posted by / September 14, 2015 / Posted in Ubuntu

HOWTO: Very low power usage on Pro Mini V2 (Arduino clone)

Skill level:  Easy enough if you’ve got a soldering iron.

Introduction

The Pro Mini V2 is an Arduino Pro Mini clone available on eBay for, typically, £1.50.  The version I buy is adjustable between 5v and 3.3v and has an ATmega 328 clocked at 8 MHz.  It’s an ideal board for development of IoT remote sensors and great for playing with and learning about the Arduino development environment.

Here’s a link to the version I buy and know works: 3V Pro Mini 2 Arduino Clone

When you want to put a sensor in a remote location the last thing you want to do is have to run a power cable to it. I’ve experimented with solar with generally poor results so battery operation is the obvious solution.  While Li-ion batteries offer higher energy density the sweet spot still seems to be the good ol’ alkaline battery.  They’re cheap, safe, recyclable and readily available.

For what it’s worth the Ikea alkaline batteries offer good value:  http://www.batteryshowdown.com/results-lo.html (I suggest buying as many packets as you can carry, so that you never ever have to go back there.  Unless you like arguing with your wife of course.)

Power Usage

The ATmega 328 has various power saving functions which involve putting it to sleep when not doing anything.  I use the Rocket Scream Low Power library to take care of putting the processor into a low power state, but I wasn’t seeing anything like the low power savings they detail on their site.

Some quick calculations:  Let’s assume a AA battery provides 2000 mAh.  I measured my Pro Mini V2 as drawing 6.7mA when powered up and doing things and 2.8mA when in sleep mode.  As a conservative estimate, let’s say it’s running for 1 hour in every 24 hour period and asleep the rest of the time.  That averages out to about 3mAh of draw.

For a 2000 mAh battery, that would give about 667 hours of runtime, or 28 days. So a standard Pro Mini V2 could run for about a month on a pair of AA batteries.  Not bad, but I think changing the batteries every month is still going to be a bit of a drag. Besides, Rocket Scream are seeing power usages in the micro-amps range when asleep.  There is clearly work to do.

How to dramatically reduce the power consumption

In this photo you can see I’ve identified some sections which are related to the power usage of the Pro Mini.

arduino_highlightsThe red section is the power LED.  This is always on when power is applied and sucks about 0.2mA when lit. If you don’t need this to be lit all the time then you can easily remove it to save some juice.  I found the easiest way was to use a pair of cutters to snip/crush the middle of the LED and then use a soldering iron to remove the bits left over.

The green section is the on-board regulator.  If you are going to be supplying power to the board via a couple of AA batteries (each battery being 1.5v, so two is 3v) then you don’t need the regulator.  You can cut this off too if you like, but.. keep reading, there’s no need to hack it off.

Saving the best until last, the yellow section is the power-selection jumper to switch between 3.3v and 5v.  It passes the power supplied by the RAW pin through the regulator and on to the board.  The regulator is inherently inefficient.  You might think that you could bypass the on-board regulator by powering the board by apply power to the Vcc pin instead, but it still seems to power the regulator.  By simply unsoldering this jumper you can disable the on-board regulator and save loads of power.  Once removed you will need to apply power to the Vcc pin at ~ 3.3V.  I used some solder-wick to clean up but you could just scrape it off with a soldering iron if you need to.

Here’s one I prepared earlier.

IMG_1029

With the jumper and LED removed. (Red and yellow boxes from previous image)

 

With the LED and solder jumper removed I measured the power usage again.  Running current is now down to 3.8mA, pretty much half of what it was.  But, most impressively the power used when asleep is down to 0.004mA.  4 microamps! Yay!

Some more quick calculations based on the same usage as before:  average power consumption drops to 0.17mA.  That gives us 490 days, 1.3 years run time off a pair of AA batteries.  That should allow for 2.5 minutes “work” an hour.  Waking up, taking some readings and sending them off via a radio should take well under 1 minute, which should allow for more power usage by a radio.

 

 

 

Conclusion

Unless you’re going to run your Arduino off a permanently attached serial connector, then just do this.  Get yourself a couple of AA batteries & a battery holder.  Apply the +ve side of the batteries to Vcc and the -ve to ground.  Stick your multi-meter in between the battery and Vcc pin to measure the lovely low current usage.  You can read the battery power being provided with the Secret Arduino Volt-meter trick.

I’ve got quite a few sensors around the house running with this set-up so I will monitor battery usage over the next few weeks or months and report back.

 

Coming Soon…

A write up of my cheapo IoT sensor network, including smoke detectors, door contact sensors, movement sensors, house-plant watering monitors, room temperature sensors and a weather station.  Plus, build an IoT sensor and buy a sausage roll for less than a fiver.

 

Multipath routing on a Raspberry Pi 2

Skill level:  Not for the faint hearted!

A few years ago, when I started working at home, I had a second ADSL line installed so that I could still get online if my ISP had an outage.  As well as fault tolerance I wanted to try and use all the available bandwidth rather than just have it sitting there “just in case”.  I achieved this using multi path routing and documented the solution here:  Over Engineering FTW.

This has been running really well on a Raspberry Pi for about 3 years (with an older kernel, see later in this post for why) but recently the SD card has started to fail.  Although this would be easy to fix; simply replace the SD card and copy my scripts over, the rural town I live in has just been upgraded to FTTC and so my connection speed has gone from about 8 Mbps to about 70 Mbps on each line.  The first generation Pi doesn’t have enough horsepower to cope with 70 Mbps let alone 140Mbps, and indeed the ethernet interface is only 100Mbps.  I had a Raspberry Pi 2 spare anyway so I figured I would use that and add a second gigabit NIC so I could cope with the theoretical 140 Mbps connection to the internet, and since I had two NICs I might as well use both of them.

Physical layout

This is what I came up with:

New network config

 

 

 

 

 

 

 

 

 

  • Two lines coming from the cabinet to my house, one with Plusnet and one with TalkTalk
  • The Plusnet line:
    • It came with an OpenReach vDSL bridge and a crappy locked down router, so I chucked the router away and used PPPoE tools to bring up the PPP connection
    • The vDSL bridge talks to the Raspberry Pi over a VLAN to keep it separated from the other noise on the switch
    • Interface eth1.1000 is an unnumbered interface and ppoeconf uses a layer 2 discovery protocol to find the bridge
    • Once the PPP connection is established ppp1 can be used to route traffic to the internet
  • The TalkTalk line:
    • It too came with a crappy router, but no OpenReach bridge.  So I had to use it.
    • The TalkTalk router talks to the Raspberry Pi over VLAN 10.  Those ports are untagged on the switch, so as far as everyone on that network knows its just a self contained LAN.
    • Interface eth0 on the Raspberry Pi has an address on that LAN and uses the TalkTalk router to talk to the internet
  • The main LAN:
    • Interface eth1 is used to connect to the main LAN
    • Clients on the LAN use the Raspberry Pi as their default gateway

With me so far?  Essentially we have the normal eth0 interface of the Pi connected to one LAN with its own router and eth1 (a USB gigabit ethernet adapter) has a tagged VLAN for connection to the OpenReach bridge (eth1.1000) and an untagged default network for connecting the the main LAN.  Once the layer 2 connection with the bridge is established a PPP connection becomes the second route to the internet.

The death of route caching

Around version 3.6 of the Linux kernel “route caching” was removed.  With route caching in place you could set up a default route with multiple hops, something along the lines of:

ip route add default nexthop via 192.168.1.254 dev eth0 nexthop via 192.168.2.254 dev eth1

When a packet needed routing to the internet the kernel would do a round-robin selection of which route to use and then remember that route for a period of time.  The upshot of this was, for example, that if you connected to www.bbc.co.uk and got routed first via 192.168.1.254 and so SNATed to 212.159.20.70 then all subsequent traffic for that destination also got routed via the same route and had the same source IP address.  Without route caching the next packet to that same destination would (probably) use the other route, and in the case of my home user scenario would arrive from a different source IP address – my two internet connections having different IP addresses.  Although HTTP is a connectionless protocol this change of IP address did seem to freak some services out.  For protocols with connections the story is worse, e.g. packets of an SSH connection would arrive at the far end from from two different IP addresses and probably get dropped.  Route caching was a simple fix for this issue and worked well, as far as I was concerned anyway.

Im sure the reasons to remove it are valid, but for my simple use case it worked very well and the alternative, and now only option is to use connection marking to simulate the route caching.  When I first looked at it I was baffled and thought I would just go back to a pre 3.6 kernel and use route caching again.  But, in the standard Raspbian distro there isn’t a kernel old enough for the Raspberry Pi 2 to make use of it.

So I was stuck…  I had to use a Raspberry Pi 2 to get enough packet throughput to max out my internet connections, and I couldn’t use route caching because there wasn’t a kernel old enough.  This meant I was going to have to either compile my own kernel or learn to use connection marking.  Joy.

Alternative projects

The documentation for Netfilter is extensive but I found a lot of it to be out of date and very hard to grok.  I found a few projects who had already implemented connection tracking/marking namely FWGuardian and Fault Tolerant Router.

FWGuardian is, as far as I can tell, designed for something orthogonal to my set up.  Where you might have lots of connections coming in to a server, or a number of offices which need to connect to other offices via pre-defined routes.  I played around with it for a while, and Humberto very kindly offered me support over email, but ultimately it was too involved and complex for my needs.  You should check out the project though if you have advanced requirements.  It’s got some brilliant features for a more enterprise oriented setup.

Fault Tolerant Router is a much simpler setup and matched my requirements very closely.  At it’s core it’s a Ruby script which can write your iptables rules and routing tables and constantly monitor the links.  If one goes down it can dynamically rewrite your rules and direct all traffic down the working connection.  However, it’s not expecting to use a PPP connection where gateways can change and it’s not really been tested with VLANs, although in practice it handled VLANs just fine.

But, at the end of the day, I wanted to learn how to do this myself and so I used the rules generated by Fault Tolerant Router to understand how connection marking was supposed to work and then started to implement my own home-grown solution for teh lolz.

Multi-path routing and connection marking

As I understand it, the idea with connection marking, or connection tracking – I’m not sure what the difference is, is that when a new conversation starts the packets are marked with an identifier.  You can then set ip rules to dictate which route packets with a particular mark take.  In essence once a new connection is established and a route selected, all other packets in that conversation take on the same mark and so the same route.  This emulates the route caching of the past.  I don’t really get how, in the case of an HTTP conversation (or flow) which is connectionless, all the packets in the conversation get marked the same.  This page has some more details, but I haven’t read it properly yet.  Anyway, we don’t know HOW it works, but it does.  Good enough.

IPtables

First of all we need to create the iptables configuration to set up connection marking.  Here’s the relevant extract from the iptables.save file:

*mangle
 :PREROUTING ACCEPT [0:0]
 :POSTROUTING ACCEPT [0:0]
 :OUTPUT ACCEPT [0:0]
 :INPUT ACCEPT [0:0]
 [0:0] -A PREROUTING -i eth1 -j CONNMARK --restore-mark
 [0:0] -A PREROUTING -i ppp1 -m conntrack --ctstate NEW -j CONNMARK --set-mark 1
 [0:0] -A PREROUTING -i eth0 -m conntrack --ctstate NEW -j CONNMARK --set-mark 2
 [0:0] -A POSTROUTING -o ppp1 -m conntrack --ctstate NEW -j CONNMARK --set-mark 1
 [0:0] -A POSTROUTING -o eth0 -m conntrack --ctstate NEW -j CONNMARK --set-mark 2

-i = –in-interface and -0 = –out-interface

These rules set a mark depending on which interface is used.  These changes happen in the mangle table.

Packets going in or out the WAN via ppp1 or eth0 which are a new connection are marked with a 1 or a 2 depending on which interface they use.  The decision about which route to use is done in the rules which we will see later.  Any packets coming in to eth1, so from the LAN, have their marks restored on the way in so they can be dealt with accordingly.

Now let’s have a look at the filter table:

*filter
 :INPUT DROP [0:0]
 :FORWARD DROP [0:0]
 :OUTPUT ACCEPT [0:0]
 :LAN_WAN - [0:0]
 :WAN_LAN - [0:0]
[0:0] -A INPUT -i lo -j ACCEPT
 [0:0] -A INPUT -i eth1 -j ACCEPT
 [0:0] -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
[0:0] -A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
 [0:0] -A FORWARD -i eth1 -o ppp1 -j LAN_WAN
 [0:0] -A FORWARD -i eth1 -o eth0 -j LAN_WAN
 [0:0] -A FORWARD -i ppp1 -o eth1 -j WAN_LAN
 [0:0] -A FORWARD -i eth0 -o eth1 -j WAN_LAN
## Clamp MSS (ideal for PPPoE connections)
 [0:0] -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
 [0:0] -A LAN_WAN -j ACCEPT
 [0:0] -A WAN_LAN -j REJECT

The default policy is set to DROP, so any packet not matching one of the rules are dropped.

INPUT applies to packets which are bound for the router itself.  Packets from the local interface are allowed, and packets from eth1 (the main LAN) are also allowed.

FORWARD applies to packets which are passing through the router on their way somewhere else.  Packets which are known to be part of an already in-progress session are allowed.  Packets are then categorised as LAN to WAN or WAN to LAN and dealt with by the rules LAN_WAN or WAN_LAN, getting accepted and rejected respectively.  All this boils down to LAN clients using the Raspberry Pi as a router and so having their packets forwarded are allowed out and packets coming in from the internet are rejected, the exception being if they are part of an on-going connection.

Clamping MSS to MTU deals with a particular issue with using PPPoE connections where the MTU can’t be the usual 1500 bytes.  Because a lot of ISPs block the ICMP messages that would normally deal with asking the client to send smaller packet sizes we use this handy trick to make sure that packets can go out unfragmented.  If you find that some web pages are slow to load and others are not, then try switching this on.  If you’re only using upstream ISP provided routers you probably don’t need this.

Lastly in iptables we enable SNAT or masquerading so that connections out to the internet appear to come from a valid internet routable IP address not our LAN IP address:

#SNAT: LAN --> WAN
 [0:0] -A POSTROUTING -o ppp1 -j SNAT --to-source 212.159.20.70
 [0:0] -A POSTROUTING -o eth0 -j SNAT --to-source 192.168.1.253

Routing tables

We’ve configured iptables to add a mark to traffic depending on which WAN interface it is going in or out of.  But this is only marking the packets, there is no logic to make sure that packets of the same mark use the same route.  To make this happen we use ip rules.

First create three new routing tables by editing /etc/iproute2/rt_tables.  I’ve added this to the bottom:

1 plusnet
 2 talktalk
 3 loadbal

Now we add a default route to the first two of those tables:

ip route add default via $PPP_GATEWAY_ADDRESS dev ppp1 src 212.159.20.70 table plusnet
ip route add default via 192.168.1.254 dev eth0 src 192.168.1.253 table talktalk

$PPP_GATEWAY_ADDRESS is set when the PPP session is established and changes.  We can look at ways to find that address later, but for now just substitute the “P-t-P” IP address from “ifconfig ppp1” or whatever your ppp interface number is, or in the case of an ISP-provided router, the LAN side IP of that router.

This is simply creating a routing table with the name of the ISP that will be used and a default route which can find its way to the internet for that ISP.

Next we create the loadbal routing table which is a combination of the previous two:

ip route add default table loadbal nexthop via $PPP_GATEWAY_ADDRESS dev ppp1 nexthop via 192.168.1.254 dev eth0

which is the same idea as we used in the old route caching days, a round-robin route which flicks between the two available routes to the internet.

ip rules

We’ve now created the iptables entries to track and mark traffic from each of the two ISPs and add some basic firewalling and IP masquerading.  We’ve also created a routing table for each ISP and a load-balancing table which splits the traffic between the two ISPs.

Now we need to create some rules to govern which of the routing tables is used for a particular connection.  The commands to do this are:

ip rule add from $PPP_IPADDR table plusnet pref 40000
ip rule add from 192.168.1.253 table talktalk pref 40100
ip rule add fwmark 0x1 table plusnet pref 40200
ip rule add fwmark 0x2 table talktalk pref 40300
ip rule add from 0/0 table loadbal pref 40400

The rules are matched in numerical order based on preference and once a rule matches that’s it.  The first two rules make sure that traffic from the routers uses the correct table.

The important rules are the last three.  Traffic which has been marked “1” will always use the plusnet routing table, traffic marked as “2” will always use the talktalk routing table.  This ensures that all traffic which is part of an on-going conversation will always use the same router out to the internet, and so always come from the same IP address.

The last rule only matches traffic which is not already marked i.e. new conversations.  This routing table, as can be seen in the previous section, has a multi-path route to balance traffic between the two routes out.  Once a conversation is established the IPtables conntrack rules will mark the traffic and so one of the two fwmark rules will match.

Now delete the main default route so that the above rules don’t get bypassed with a route in the “main” table:

ip route del default

And that’s it.  You should now have a router which splits the traffic fairly evenly across two internet connections and keeps tabs on which packets should go out of which routers.  I’ve had this running for a month or so now, and it seems to be working fine.  I’ve had the Pi lock up a couple of times, but I think that’s related to the USB gigabit ethernet adapter.

Smart Netflix hacks

Services such as unblock-us allow you to work around some geographic content blocks by acting as your DNS server and replying with the IP address of, say, the US based Netflix server instead of the UK ones.  I’ve installed dnsmasq on my Pi as well and configured it to use the Unblock DNS servers instead of my ISP or Google servers.  The clients on the LAN get their network configuration over DHCP from the Pi which sets the DNS server address for the clients to the Pi itself which then handles DNS lookups using the Unblock servers upstream.  This works really well for most Netflix clients but I was having a lot of problems getting the Chromecast to work with Netflix and Unblock US.

It turns out that Google have hard-coded it’s own DNS servers into the Chromecast and so your local DNS settings are ignored.  Nice one Google.

Because we’re using a Linux box as our router we can do this:

iptables -t nat -A PREROUTING -s <Netflix Client IP>/32 -d 8.8.8.8 -p udp --dport 53 -j DNAT --to <Alternative DNS Server IP Address>
 iptables -t nat -A PREROUTING -s <Netflix Client IP>/32 -d 8.8.4.4 -p udp --dport 53 -j DNAT --to <Alternative DNS Server IP Address>

Using the NAT table we rewrite the DNS lookup bound for Google’s DNS servers to send it to our dnsmasq server instead. lol.

Spreading interrupts across cores

Network cards have queues for tx and rx.  Higher end cards will typically have more queues, but on the Pi the on-board NIC (which is actually connected via USB) has one for tx and one for rx, as do the VLAN interfaces and the PPP interfaces.  Each of these queues has a CPU affinity and it seems that by default the queues all use the same CPU core.

When downloading an ISO with BitTorrent and the load-balancing set up I was able to achieve just over 10 MBytes a second.  But the Pi became really unresponsive.  Looking at top showed one CPU core maxed out in soft interrupts:

without_queues_spread

 

 

 

 

By adjusting the CPU affinity to spread these IRQs across multiple CPUs I squeeze out a tiny bit more network throughput, but more usefully the Pi remained responsive under heavy load:

with_queues_spread

 

 

 

 

 

The commands I used to do this are:

echo 1 > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo 1 > /sys/class/net/eth0/queues/tx-0/xps_cpus
echo 2 > /sys/class/net/eth1/queues/tx-0/xps_cpus
echo 2 > /sys/class/net/eth1/queues/rx-0/rps_cpus
echo 4 > /sys/class/net/eth1.1000/queues/tx-0/xps_cpus
echo 4 > /sys/class/net/eth1.1000/queues/rx-0/rps_cpus
echo 8 > /sys/class/net/ppp1/queues/tx-0/xps_cpus
echo 8 > /sys/class/net/ppp1/queues/rx-0/rps_cpus

Source

Here’s a tgz file containing my iptables rules and a script to set up the above: routing

Update:  I’ve put the files in this Github repo:  https://github.com/8none1/multipathrouting

If you’re interested in helping to make the scripts a bit more generic and adding fault-tolerance let me know.