Cisco 7941, Asterisk and SIP

I got a Cisco 7941 off eBay.  This is a phone which was £400 when new (some time around 2004) but can now be picked up for about £10.  These phones went End Of Sale in January 2010, so even if mine was one of the last phones to roll off the production line it’s still about 7 years old but it’s still working perfectly.  A testament to the good build quality of these phones, and perhaps the previous owner’s careful handling.

Since these devices are no longer supported many companies will be getting rid of them (or probably already have) so there should be some bargains to be had for phone geeks.

Q: Does the Cisco 7941 work with Asterisk?
A: Yes.  You need to load the SIP firmware (the focus of this post) or chan-sccp (out of scope for this post but I’ll check it out at some point).

Q: Does the Cisco 7941 work with SIP?
A: Yes.  You need to flash the correct firmware though.

Q: Is it really hard to get working?
A: No.  If you’re comfortable with Linux and a few command line tools.  And assuming you already have Asterisk set up.

Q: Is a lot of the information on the web about how to set up the 7941 wrong?
A: Yes.  There is a lot of confusion about config files (the 7940 and 7941 use different ones).

Q: Will you tell us how you got your phone to work?
A: Yes!  However – this is what works for me.  You will need to tweak the config in places.

 

The steps to getting this phone working as a SIP extension on Asterisk on Ubuntu / Raspberry Pi:

  1. Set up a TFTP server
  2. Download the SIP firmware from Cisco
  3. Flash the phone with the firmware via the TFTP server
  4. Configure the SIP extension in Asterisk
  5. Write the config files for the phone and upload them via the TFTP server
  6. Make a call!
  7. Optional Extras
    1. Dial plan
    2. Ring tones
    3. Dial tones
    4. Wallpaper
    5. Telephone Directory
  8. Final Tip

Set up a TFTP Server

The phone will download it’s firmware and config via TFTP.  It needs to download it’s config on every boot, so you will always need a TFTP server running.  I think that if the TFTP server is unavailable it will just use the previous config, so it’s possible that you can get away without it, but I haven’t tried.  My recommendation is that you install dnsmasq.  It’s a small and full featured DNS server which also includes a DHCP & TFTP server which are easy to configure and it’s almost certainly packaged for your distro.  You should also (temporarily) disable any other DHCP servers on your local network so that dnsmasq is the only thing offering DHCP addresses.  This will simplify the process of getting the phone to find the TFTP server, since with dnsmasq it will all be automatic.  If you later re-enable your original DHCP server, say on your router, then you will need to configure it to give out the address of the dnsmasq TFTP server and disable DHCP on dnsmasq.  In my opinion, if you’re going to be running a Cisco IP phone on your network you’d be better off moving all DHCP to dnsmasq.

The full configuration of dnsmasq it’s out of scope for this doc, but in a nutshell you need these in your dnsmasq config:

  • Set up a DHCP range
dhcp-range=192.168.1.1,192.168.1.100,24h
  • Enable the TFTP server
enable-tftp
  • Set the TFTP path
tftp-root=/home/<your user>/tftp  (or whatever works for you)

Download the SIP Firmware from Cisco

Usually Cisco require a valid support contract before you can download anything useful from their website, but it seems that since these phones are now out of support they have offered up the firmware free of charge.  You do still need to register an account to download the files.  At the time of writing the latest version is 9.4.2 SR 3 dated 14th February 2017 – so bang up to date, even though these phones are end-of-life.  Bizarre, but good for us.  Thanks Cisco!

Go here: https://software.cisco.com/download/type.html?mdfid=280083379&catid=280789323

Follow the link to the SIP software.

You want to download the “SIP firmware files only”

Unzip that file into the root of your TFTP server (the location you set in the previous step).  You should have 8 files in there:

apps41.9-4-2ES26.sbn
dsp41.9-4-2ES26.sbn
term41.default.loads
cnu41.9-4-2ES26.sbn
jar41sip.9-4-2ES26.sbn
term61.default.loads
cvm41sip.9-4-2ES26.sbn
SIP41.9-4-2SR3-1S.loads

This is everything you need to reflash your phone to the latest SIP firmware.  Now you need to get the phone to reboot in to firmware download mode.

Flash the phone with the firmware via the TFTP server

  1. Unplug the phone from the power.  Make sure that the network cable is still connected (unless you’re using using PoE).
  2. Plug the power back in and hold down the # key
  3. Eventually you will see the “line” lights start to flash orange.  It might take a couple of minutes to get to this stage, don’t give up, just keep holding down #
  4. When the line lights are flashing type 123456789*0#  This will start firmware download mode.
  5. The screen will go black for a moment and then go through the process of getting an IP address and connecting to the TFTP server
  6. Once connected to the TFTP server the software download will start
  7. The phone will reboot once download is complete and present you with an “Unprovisioned” message on the screen.  This is good news!  The phone firmware has now been updated.

I put together a video showing this process.  It’s not very interesting but it will give you an idea of what to expect.  The actual downloading of the firmware section has been sped up 3X.

 

Configure the SIP extension in Asterisk

Now you need to configure the SIP extension in Asterisk.  Do this as per any other SIP extension, but bear this important piece of information in mind:  The Cisco 7941 can only deal with 8 character passwords, so keep your SIP authentication secret to 8 characters.

While you’re in Asterisk configuration mode, take a moment to note down these bits of information as well (in Advanced SIP settings in FreePBX):

  • RTP Port range, start and end.
  • Bind Port (probably 5060)

Write the config files for the phone and upload them via the TFTP server

Please take the time to read this section fully,  this is the part that is most troublesome.  The Cisco 7941 is very picky about it’s config file and even a small mistake will stop the phone from working.  These settings are specific to the 79×1 series of phones running at least version 8.x of the firmware.  If your phone is not a 79×1 and/or is not running v9.x.x of the firmware then these settings are not for you.

Once the phone has loaded it’s firmware and booted, it will go looking for a file called SEP<PHONE MAC ADDRESS>.cnf.xml.  So if the MAC address of your phone is 11:22:33:44:55:66 then the config file needs to be named SEP112233445566.cnf.xml.  This file needs to be in the root of your TFTP server.

You will see mention of a file called XMLDefault.cnf.xml.  If you’ve only got a few phones, don’t worry about this, you don’t need it.

So here is a config file which is about as minimal as I can make it:

<device>
    <deviceProtocol>SIP</deviceProtocol>
    <sshUserId>cisco</sshUserId>
    <sshPassword>cisco</sshPassword>
    <ipAddressMode>0</ipAddressMode>

    <devicePool>
        <dateTimeSetting>
            <dateTemplate>D/M/Ya</dateTemplate>
            <timeZone>GMT Standard/Daylight Time</timeZone>
            <ntps>
                <ntp>
                    <name>#IP ADDRESS OF AN NTP SERVER#</name>
                    <ntpMode>Unicast</ntpMode>
                </ntp>
            </ntps>
        </dateTimeSetting>

        <callManagerGroup>
            <members>
                <member priority="0">
                    <callManager>
                        <ports>
                            <ethernetPhonePort>2000</ethernetPhonePort>
                            <sipPort>#SIP PORT NUMBER FROM YOUR ASTERISK SERVER#</sipPort>
                        </ports>
                        <processNodeName>#IP ADDRESS OF YOUR ASTERISK SERVER#</processNodeName>
                    </callManager>
                </member>
            </members>
        </callManagerGroup>
    </devicePool>

    <sipProfile>
        <sipProxies>
            <registerWithProxy>true</registerWithProxy>
        </sipProxies>
        <sipCallFeatures>
            <cnfJoinEnabled>true</cnfJoinEnabled>
            <rfc2543Hold>false</rfc2543Hold>
            <callHoldRingback>2</callHoldRingback>
            <localCfwdEnable>true</localCfwdEnable>
            <semiAttendedTransfer>true</semiAttendedTransfer>
            <anonymousCallBlock>2</anonymousCallBlock>
            <callerIdBlocking>2</callerIdBlocking>
            <dndControl>0</dndControl>
            <remoteCcEnable>true</remoteCcEnable>
        </sipCallFeatures>

        <sipStack>
            <sipInviteRetx>6</sipInviteRetx>
            <sipRetx>10</sipRetx>
            <timerInviteExpires>180</timerInviteExpires>
            <timerRegisterExpires>3600</timerRegisterExpires>
            <timerRegisterDelta>5</timerRegisterDelta>
            <timerKeepAliveExpires>120</timerKeepAliveExpires>
            <timerSubscribeExpires>120</timerSubscribeExpires>
            <timerSubscribeDelta>5</timerSubscribeDelta>
            <timerT1>500</timerT1>
            <timerT2>4000</timerT2>
            <maxRedirects>70</maxRedirects>
            <remotePartyID>true</remotePartyID>
            <userInfo>None</userInfo>
        </sipStack>

        <autoAnswerTimer>1</autoAnswerTimer>
        <autoAnswerAltBehavior>false</autoAnswerAltBehavior>
        <autoAnswerOverride>true</autoAnswerOverride>
        <transferOnhookEnabled>false</transferOnhookEnabled>
        <enableVad>false</enableVad>
        <preferredCodec>g711ulaw</preferredCodec>
        <dtmfAvtPayload>101</dtmfAvtPayload>
        <dtmfDbLevel>3</dtmfDbLevel>
        <dtmfOutofBand>avt</dtmfOutofBand>
        <alwaysUsePrimeLine>false</alwaysUsePrimeLine>
        <alwaysUsePrimeLineVoiceMail>false</alwaysUsePrimeLineVoiceMail>
        <kpml>3</kpml>
        <natEnabled>false</natEnabled>
        <phoneLabel>#PHONE NAME#</phoneLabel>
        <stutterMsgWaiting>0</stutterMsgWaiting>
        <callStats>false</callStats>
        <silentPeriodBetweenCallWaitingBursts>10</silentPeriodBetweenCallWaitingBursts>
        <disableLocalSpeedDialConfig>false</disableLocalSpeedDialConfig>
        <startMediaPort>#RTP START PORT#</startMediaPort>
        <stopMediaPort>#RTP END PORT#</stopMediaPort>

        <sipLines>
            <line button="1">
                <featureID>9</featureID>
                <featureLabel>#EXT NUM#</featureLabel>
                <proxy>USECALLMANAGER</proxy>
                <port>#SIP PORT#</port>
                <name>#EXT NUM#</name>
                <displayName>#EXT NAME#</displayName>
                <autoAnswer>
                    <autoAnswerEnabled>2</autoAnswerEnabled>
                </autoAnswer>
                <callWaiting>3</callWaiting>
                <authName>#SIP AUTH NAME#</authName>
                <authPassword>#8 CHAR PASSWORD#</authPassword>
                <sharedLine>false</sharedLine>
                <messageWaitingLampPolicy>1</messageWaitingLampPolicy>
                <messagesNumber>#VM NUM#</messagesNumber>
                <ringSettingIdle>4</ringSettingIdle>
                <ringSettingActive>5</ringSettingActive>
                <contact>#EXT NUM#</contact>
                <forwardCallInfoDisplay>
                    <callerName>true</callerName>
                    <callerNumber>true</callerNumber>
                    <redirectedNumber>false</redirectedNumber>
                    <dialedNumber>true</dialedNumber>
                </forwardCallInfoDisplay>
            </line>

            <line button="2">
                <featureID>9</featureID>
                <featureLabel>#EXT NUM#</featureLabel>
                <proxy>USECALLMANAGER</proxy>
                <port>#SIP PORT#</port>
                <name>#EXT NUM#</name>
                <displayName>#EXT NUM#</displayName>
                <autoAnswer>
                    <autoAnswerEnabled>2</autoAnswerEnabled>
                </autoAnswer>
                <callWaiting>3</callWaiting>
                <authName>#SIP AUTH NAME#</authName>
                <authPassword>#8 CHAR PASSWORD#</authPassword>
                <sharedLine>false</sharedLine>
                <messageWaitingLampPolicy>1</messageWaitingLampPolicy>
                <messagesNumber>#VM NUM#</messagesNumber>
                <ringSettingIdle>4</ringSettingIdle>
                <ringSettingActive>5</ringSettingActive>
                <contact>#EXT NUM#</contact>
                <forwardCallInfoDisplay>
                    <callerName>true</callerName>
                    <callerNumber>true</callerNumber>
                    <redirectedNumber>false</redirectedNumber>
                    <dialedNumber>true</dialedNumber>
                </forwardCallInfoDisplay>
            </line>
        </sipLines>

        <voipControlPort>#SIP PORT#</voipControlPort>
        <dscpForAudio>184</dscpForAudio>
        <ringSettingBusyStationPolicy>0</ringSettingBusyStationPolicy>
        <dialTemplate>dialplan.xml</dialTemplate>
    </sipProfile>

    <commonProfile>
        <phonePassword></phonePassword>
        <backgroundImageAccess>true</backgroundImageAccess>
        <callLogBlfEnabled>1</callLogBlfEnabled>
    </commonProfile>

    <loadInformation>SIP41.9-4-2SR3-1S</loadInformation>
    <vendorConfig>
        <disableSpeaker>false</disableSpeaker>
        <disableSpeakerAndHeadset>false</disableSpeakerAndHeadset>
        <pcPort>0</pcPort>
        <settingsAccess>1</settingsAccess>
        <garp>0</garp>
        <voiceVlanAccess>0</voiceVlanAccess>
        <videoCapability>0</videoCapability>
        <autoSelectLineEnable>0</autoSelectLineEnable>
        <webAccess>0</webAccess>
        <spanToPCPort>1</spanToPCPort>
        <loggingDisplay>1</loggingDisplay>
        <loadServer></loadServer>
        <sshAccess>0</sshAccess>
    </vendorConfig>

    <versionStamp>001</versionStamp>
    <networkLocale>United_Kingdom</networkLocale>
    <networkLocaleInfo>
        <name>United_Kingdom</name>
        <uid>64</uid>
        <version>1.0.0.0-4</version> 
    </networkLocaleInfo>

    <deviceSecurityMode>1</deviceSecurityMode>
    <authenticationURL></authenticationURL>
    <servicesURL></servicesURL>
    <transportLayerProtocol>2</transportLayerProtocol>
    <certHash></certHash>
    <encrConfig>false</encrConfig>
    <dialToneSetting>2</dialToneSetting>
</device>

Copy and paste this into a text editor and search and replace the following:

  • #IP ADDRESS OF AN NTP SERVER#  –  with  –  the IP address of an NTP server
  • #SIP PORT FROM YOUR ASTERISK SERVER#  –  with  –  the SIP port of your asterisk server is listening on.  Probably 5060
  • #IP ADDRESS OF YOUR ASTERISK SERVER#  –  with  –  the IP address of your Asterisk server
  • #PHONE NAME#  –  with  –  the text you want to appear at the top right of the phone screen
  • #RTP START PORT#  –  with  –  the RTP port range start from the previous stage
  • #RTP END PORT#’  –  with  –  the RTP port range end from the the previous stage
  • #EXT NUM#  –  with  –  the Asterisk extension number as configured in the previous stage
  • #SIP PORT#  –  with  –  the SIP port of your Asterisk server.  Probably 5060
  • #EXT NAME#  –  with  –  the name you want to give this extension
  • #SIP AUTH NAME#  –  with  –  the username for the SIP extension as configured in Asterisk
  • #8 CHAR PASSWORD#  –  with  –  the password for the SIP extension as configured in Asterisk
  • #VM NUM#  –  with  –  the number you dial for Voicemail.  Probably *98

Note that this config file has two lines configured.  If you just blindly search and replace you’ll end up with two extensions configured the same.

Some comments on what some of the XML tags do:

  • ipAddressMode – 0 is IP v4 only. But this seems to have little effect.
  • registerWithProxy – true – Registers the device with Asterisk, this allows incoming calls to be sent to the phone.  If you’re getting “Unregistered” message on the screen, check you have this set.
  • featureId – 9 is SIP
  • autoAnswerEnabled – 2 – 2 seems to be “off”
  • webAccess – 0 – 0 is on (?!)
  • sshAccess -0 – ditto
  • versionStamp – bump this up every time you make a change.  Something like YYYMMDD001..2..3 etc
  • networkLocale – United_Kingdom – sets the tones to UK, see the optional extras section for more info.
  • transportLayerProtocol – 2 is UDP, 1 is TCP
  • dialToneSettings – 2 is “always use internal dialtone”.  See option extras for more info.

Edit this file as necessary and then save it to the root of your TFTP server with the filename: SEP<MAC>.cnf.xml.  If your phone MAC address was aa:bb:33:44:55:66 then the filename would be: SEPAABB33445566.cnf.xml  Note that it’s case sensitive, letters in the MAC address should be in upper case the extensions should be in lowercase.  You can get the MAC address for the phone from the syslog on your dnsmasq server.

If your phone is still in “Unprovisioned” mode it will have been asking for this config file repeatedly.  Once you save the file you should see the phone reboot shortly afterwards.  It may download the firmware again for some reason, just leave it to get on with it.

Make a call!

If everything has worked you should see your extension listed on the right hand side of the screen near the buttons, and the name of the phone should appear at the top of the screen.  If the icon next to the line buttons is that of a phone without an x through it, then you’re probably good to go!  Press the line button and see if you get a dial tone.  If not, then check the phone logs:

  • Press Settings
  • Press 6
  • Press 1

From these logs you should be able to tell if the phone has loaded your config correctly.  Errors about “updating locale” or “no trust list installed” can be ignored.  If there is a problem with the config file itself a generic error will be listed here.  If the phone won’t load the config file the most likely reason is that there is a typo in your XML file.  Good luck finding it.  You can SSH in to the phone to get more detailed logs and debugging information, but I haven’t tried this yet.  Google is your friend.

Optional Extras

Dial plan

The dial plan tells the phone how to process the digits you type and when to start sending the call.  Without a dial plan the phone simply waits a period of time for you to stop typing numbers before I decides you’re done and starts the call.  By using a dial plan you can reduce the amount of time spent waiting after you’ve finished keying in the number.  Here’s an example plan I’ve edited based on this post on Phil Lavin’s blog (Thanks Phil!) http://phil.lavin.me.uk/2012/11/united-kingdom-dial-plan-xml-for-cisco-phones/

<DIALTEMPLATE>
    <TEMPLATE MATCH="999" Timeout="0"/> <!-- Emergency -->
    <TEMPLATE MATCH="112" Timeout="0"/> <!-- Emergency -->
    <TEMPLATE MATCH="0500......" Timeout="0"/> <!-- Apparently 0500 is always 10 digits -->
    <TEMPLATE MATCH="0800......" Timeout="0"/> <!-- Apparently 0800 is always 10 digits -->
    <TEMPLATE MATCH="00*" Timeout="5"/> <!-- International, 00 prefixed. No fixed length -->
    <TEMPLATE MATCH="0.........." Timeout="0"/> <!-- UK 11 digit, 0 prefixed -->
    <TEMPLATE MATCH="26...." Timeout="0"/> <!-- My local STD numbers start 26 -->
    <TEMPLATE MATCH="\*.." Timeout="0"/> <!-- Asterisk *.. codes -->
    <TEMPLATE MATCH="\*98...." Timeout="0"/> <!-- Asterisk direct VM access *981234-->
    <TEMPLATE MATCH="1..." Timeout="0"/> <!-- Internal numbers -->
    <TEMPLATE MATCH="2..." Timeout="0"/>  <!-- Internal numbers -->
    <TEMPLATE MATCH="*" Timeout="5"/> <!-- Anything else -->
</DIALTEMPLATE>

Save this to the root of your TFTP server, named “dialplan.xml” (lowercase).

Ring tones

Everyone likes novelty ringtones.  You can find plenty of ringtones in a format which is compatible with your phone (raw format, 8000 Hz sample rate, 8 bit, ulaw, max 2 seconds).  These files need to be placed in to the root of your TFTP server.  I tried putting them in a sub-directory but it didn’t work.  Then you need to create a file called “ringlist.xml” also in the root of the server.  The format of this file is:

<CiscoIPPhoneRingList>
    <Ring>
        <DisplayName>#DISPLAY TEXT#</DisplayName>
        <FileName>#FILENAME#</FileName>
    </Ring>
    <Ring>
        <DisplayName>#DISPLAY TEXT#</DisplayName>
        <FileName>#FILENAME#</FileName>
    </Ring>
</CiscoIPPhoneRingList>

Filenames are case sensitive.  Once you’ve save this file, copy it to “distinctiveringlist.xml” as well.  This will allow you to set ring tones for the default ringer and different rings for each line.

Dial tones

By default the 7941 will have a psuedo North American dial tone.  This is annoyingly shrill (yes, it is).  By specifying a NetworkLocale in the phone config we can get it to load a different set of informational tones from a file stored in (per the example XML above) United_Kingdom.  In the root of the TFTP server create a directory called United_Kingdom.  In this directory you need to create a file called g3-tones.xml.  Bizarrely Cisco require you to have a support contract in order to download the correct tones settings for your country, despite giving the phone firmware away for free.  Go figure.  So this means I’m not going to paste the XML here.  If you search hard enough you’ll find an example g3-tones.xml file you can use as a base.  In our phone configuration above we told the phone to always use the internal dialing tone, so this means we only need to change the idial section of the tones file.  The magic numbers are:

  • 31538
  • -780
  • 30831
  • -973

Wallpaper

The phone comes with a single default wallpaper with horizontal lines on it.  This is easily replaced by your own designs with a simple PNG.   Create a directory in the root of the TFTP server called Desktops.  In here create another directory called 320x196x4.

In to this directory you need to place a “List.xml” file:

<CiscoIPPhoneImageList>
    <ImageItem Image="TFTP:Desktops/320x196x4/ubuntu-tn.png"
       URL="TFTP:Desktops/320x196x4/ubuntu.png"/>
</CiscoIPPhoneImageList>

The “-tn” in the file is a smaller thumbnail version of the larger image.  The PNGs need to be sized exactly 320×196 for the large and 80×49 for the thumbnail.  Here’s something to get you started:

Telephone Directory

You will have noticed that the phone has a “Directories” button and a “Services” button.  I haven’t managed to add an extra phone book to the Directories button yet although I think it’s certainly possible, just that the XML file refuses to do anything.  However, I have got a phone directory working on the Services button.

In the main phone config file there is a tag for “servicesURL”.  Point this to a web server on your local network which will serve up an XML file.  For example:

 <servicesURL>http://192.168.1.1/phone/directory.xml</servicesURL>

Assuming you are using Apache 2 to serve that XML file (or it could equally be a CGI script which generates the XML dynamically from a database such as the FreePBX phone book) the format looks like this:

<CiscoIPPhoneDirectory>
   <Title>Whizzy Towers</Title>
   <DirectoryEntry>
       <Telephone>1500</Telephone>
       <Name>Lenny</Name>
   </DirectoryEntry>
   <DirectoryEntry>
       <Telephone>1234</Telephone>
       <Name>Speaking Clock</Name>
   </DirectoryEntry>
</CiscoIPPhoneDirectory>

Important note:  You must tell Apache to serve those files as type “text/xml“.  “application/xml” will not work.

You can do this via your CGI script, or if you want to serve a static file add something like this to your Apache config:

 <Location /phone/>
     ForceType text/xml
 </Location>

Inside your VirtualHost section.

Final Tip

Watch /var/log/syslog on the machine running the TFTP server.  You’ll be able to see exactly what files the phone is asking for.  Bear in mind that it does ask for files it doesn’t strictly need, so don’t worry too much about file not found errors unless it’s one of the above.

Here’s a final video showing the boot up for a fully configured phone

DHCP clients not registering hostnames in DNS automatically

To remind myself as much as anything:

I run a dnsmasq server on my router (which is a Raspberry Pi 2) to handle local DNS, DNS proxying and DHCP. For some reason one of the hosts stopped registering its hostname with the DHCP server, and so I couldn’t resolve its name to an IP address from other clients on my network.

I’m pretty sure it used to work, and I’m also pretty sure I didn’t change anything – so why did it suddenly stop? My theory is that the disk on the client became corrupt and a fsck fix removed some files.

Anyway, the cause is that the DHCP client didn’t know to send it’s hostname along with the DHCP request.

This is fixed by creating (or editing) /etc/dhcp/dhclient.conf and adding this line:

send host-name = gethostname();

 

Multipath routing on a Raspberry Pi 2

Skill level:  Not for the faint hearted!

A few years ago, when I started working at home, I had a second ADSL line installed so that I could still get online if my ISP had an outage.  As well as fault tolerance I wanted to try and use all the available bandwidth rather than just have it sitting there “just in case”.  I achieved this using multi path routing and documented the solution here:  Over Engineering FTW.

This has been running really well on a Raspberry Pi for about 3 years (with an older kernel, see later in this post for why) but recently the SD card has started to fail.  Although this would be easy to fix; simply replace the SD card and copy my scripts over, the rural town I live in has just been upgraded to FTTC and so my connection speed has gone from about 8 Mbps to about 70 Mbps on each line.  The first generation Pi doesn’t have enough horsepower to cope with 70 Mbps let alone 140Mbps, and indeed the ethernet interface is only 100Mbps.  I had a Raspberry Pi 2 spare anyway so I figured I would use that and add a second gigabit NIC so I could cope with the theoretical 140 Mbps connection to the internet, and since I had two NICs I might as well use both of them.

Physical layout

This is what I came up with:

New network config

 

 

 

 

 

 

 

 

 

  • Two lines coming from the cabinet to my house, one with Plusnet and one with TalkTalk
  • The Plusnet line:
    • It came with an OpenReach vDSL bridge and a crappy locked down router, so I chucked the router away and used PPPoE tools to bring up the PPP connection
    • The vDSL bridge talks to the Raspberry Pi over a VLAN to keep it separated from the other noise on the switch
    • Interface eth1.1000 is an unnumbered interface and ppoeconf uses a layer 2 discovery protocol to find the bridge
    • Once the PPP connection is established ppp1 can be used to route traffic to the internet
  • The TalkTalk line:
    • It too came with a crappy router, but no OpenReach bridge.  So I had to use it.
    • The TalkTalk router talks to the Raspberry Pi over VLAN 10.  Those ports are untagged on the switch, so as far as everyone on that network knows its just a self contained LAN.
    • Interface eth0 on the Raspberry Pi has an address on that LAN and uses the TalkTalk router to talk to the internet
  • The main LAN:
    • Interface eth1 is used to connect to the main LAN
    • Clients on the LAN use the Raspberry Pi as their default gateway

With me so far?  Essentially we have the normal eth0 interface of the Pi connected to one LAN with its own router and eth1 (a USB gigabit ethernet adapter) has a tagged VLAN for connection to the OpenReach bridge (eth1.1000) and an untagged default network for connecting the the main LAN.  Once the layer 2 connection with the bridge is established a PPP connection becomes the second route to the internet.

The death of route caching

Around version 3.6 of the Linux kernel “route caching” was removed.  With route caching in place you could set up a default route with multiple hops, something along the lines of:

ip route add default nexthop via 192.168.1.254 dev eth0 nexthop via 192.168.2.254 dev eth1

When a packet needed routing to the internet the kernel would do a round-robin selection of which route to use and then remember that route for a period of time.  The upshot of this was, for example, that if you connected to www.bbc.co.uk and got routed first via 192.168.1.254 and so SNATed to 212.159.20.70 then all subsequent traffic for that destination also got routed via the same route and had the same source IP address.  Without route caching the next packet to that same destination would (probably) use the other route, and in the case of my home user scenario would arrive from a different source IP address – my two internet connections having different IP addresses.  Although HTTP is a connectionless protocol this change of IP address did seem to freak some services out.  For protocols with connections the story is worse, e.g. packets of an SSH connection would arrive at the far end from from two different IP addresses and probably get dropped.  Route caching was a simple fix for this issue and worked well, as far as I was concerned anyway.

Im sure the reasons to remove it are valid, but for my simple use case it worked very well and the alternative, and now only option is to use connection marking to simulate the route caching.  When I first looked at it I was baffled and thought I would just go back to a pre 3.6 kernel and use route caching again.  But, in the standard Raspbian distro there isn’t a kernel old enough for the Raspberry Pi 2 to make use of it.

So I was stuck…  I had to use a Raspberry Pi 2 to get enough packet throughput to max out my internet connections, and I couldn’t use route caching because there wasn’t a kernel old enough.  This meant I was going to have to either compile my own kernel or learn to use connection marking.  Joy.

Alternative projects

The documentation for Netfilter is extensive but I found a lot of it to be out of date and very hard to grok.  I found a few projects who had already implemented connection tracking/marking namely FWGuardian and Fault Tolerant Router.

FWGuardian is, as far as I can tell, designed for something orthogonal to my set up.  Where you might have lots of connections coming in to a server, or a number of offices which need to connect to other offices via pre-defined routes.  I played around with it for a while, and Humberto very kindly offered me support over email, but ultimately it was too involved and complex for my needs.  You should check out the project though if you have advanced requirements.  It’s got some brilliant features for a more enterprise oriented setup.

Fault Tolerant Router is a much simpler setup and matched my requirements very closely.  At it’s core it’s a Ruby script which can write your iptables rules and routing tables and constantly monitor the links.  If one goes down it can dynamically rewrite your rules and direct all traffic down the working connection.  However, it’s not expecting to use a PPP connection where gateways can change and it’s not really been tested with VLANs, although in practice it handled VLANs just fine.

But, at the end of the day, I wanted to learn how to do this myself and so I used the rules generated by Fault Tolerant Router to understand how connection marking was supposed to work and then started to implement my own home-grown solution for teh lolz.

Multi-path routing and connection marking

As I understand it, the idea with connection marking, or connection tracking – I’m not sure what the difference is, is that when a new conversation starts the packets are marked with an identifier.  You can then set ip rules to dictate which route packets with a particular mark take.  In essence once a new connection is established and a route selected, all other packets in that conversation take on the same mark and so the same route.  This emulates the route caching of the past.  I don’t really get how, in the case of an HTTP conversation (or flow) which is connectionless, all the packets in the conversation get marked the same.  This page has some more details, but I haven’t read it properly yet.  Anyway, we don’t know HOW it works, but it does.  Good enough.

IPtables

First of all we need to create the iptables configuration to set up connection marking.  Here’s the relevant extract from the iptables.save file:

*mangle
 :PREROUTING ACCEPT [0:0]
 :POSTROUTING ACCEPT [0:0]
 :OUTPUT ACCEPT [0:0]
 :INPUT ACCEPT [0:0]
 [0:0] -A PREROUTING -i eth1 -j CONNMARK --restore-mark
 [0:0] -A PREROUTING -i ppp1 -m conntrack --ctstate NEW -j CONNMARK --set-mark 1
 [0:0] -A PREROUTING -i eth0 -m conntrack --ctstate NEW -j CONNMARK --set-mark 2
 [0:0] -A POSTROUTING -o ppp1 -m conntrack --ctstate NEW -j CONNMARK --set-mark 1
 [0:0] -A POSTROUTING -o eth0 -m conntrack --ctstate NEW -j CONNMARK --set-mark 2

-i = –in-interface and -0 = –out-interface

These rules set a mark depending on which interface is used.  These changes happen in the mangle table.

Packets going in or out the WAN via ppp1 or eth0 which are a new connection are marked with a 1 or a 2 depending on which interface they use.  The decision about which route to use is done in the rules which we will see later.  Any packets coming in to eth1, so from the LAN, have their marks restored on the way in so they can be dealt with accordingly.

Now let’s have a look at the filter table:

*filter
 :INPUT DROP [0:0]
 :FORWARD DROP [0:0]
 :OUTPUT ACCEPT [0:0]
 :LAN_WAN - [0:0]
 :WAN_LAN - [0:0]
[0:0] -A INPUT -i lo -j ACCEPT
 [0:0] -A INPUT -i eth1 -j ACCEPT
 [0:0] -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
[0:0] -A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
 [0:0] -A FORWARD -i eth1 -o ppp1 -j LAN_WAN
 [0:0] -A FORWARD -i eth1 -o eth0 -j LAN_WAN
 [0:0] -A FORWARD -i ppp1 -o eth1 -j WAN_LAN
 [0:0] -A FORWARD -i eth0 -o eth1 -j WAN_LAN
## Clamp MSS (ideal for PPPoE connections)
 [0:0] -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
 [0:0] -A LAN_WAN -j ACCEPT
 [0:0] -A WAN_LAN -j REJECT

The default policy is set to DROP, so any packet not matching one of the rules are dropped.

INPUT applies to packets which are bound for the router itself.  Packets from the local interface are allowed, and packets from eth1 (the main LAN) are also allowed.

FORWARD applies to packets which are passing through the router on their way somewhere else.  Packets which are known to be part of an already in-progress session are allowed.  Packets are then categorised as LAN to WAN or WAN to LAN and dealt with by the rules LAN_WAN or WAN_LAN, getting accepted and rejected respectively.  All this boils down to LAN clients using the Raspberry Pi as a router and so having their packets forwarded are allowed out and packets coming in from the internet are rejected, the exception being if they are part of an on-going connection.

Clamping MSS to MTU deals with a particular issue with using PPPoE connections where the MTU can’t be the usual 1500 bytes.  Because a lot of ISPs block the ICMP messages that would normally deal with asking the client to send smaller packet sizes we use this handy trick to make sure that packets can go out unfragmented.  If you find that some web pages are slow to load and others are not, then try switching this on.  If you’re only using upstream ISP provided routers you probably don’t need this.

Lastly in iptables we enable SNAT or masquerading so that connections out to the internet appear to come from a valid internet routable IP address not our LAN IP address:

#SNAT: LAN --> WAN
 [0:0] -A POSTROUTING -o ppp1 -j SNAT --to-source 212.159.20.70
 [0:0] -A POSTROUTING -o eth0 -j SNAT --to-source 192.168.1.253

Routing tables

We’ve configured iptables to add a mark to traffic depending on which WAN interface it is going in or out of.  But this is only marking the packets, there is no logic to make sure that packets of the same mark use the same route.  To make this happen we use ip rules.

First create three new routing tables by editing /etc/iproute2/rt_tables.  I’ve added this to the bottom:

1 plusnet
 2 talktalk
 3 loadbal

Now we add a default route to the first two of those tables:

ip route add default via $PPP_GATEWAY_ADDRESS dev ppp1 src 212.159.20.70 table plusnet
ip route add default via 192.168.1.254 dev eth0 src 192.168.1.253 table talktalk

$PPP_GATEWAY_ADDRESS is set when the PPP session is established and changes.  We can look at ways to find that address later, but for now just substitute the “P-t-P” IP address from “ifconfig ppp1” or whatever your ppp interface number is, or in the case of an ISP-provided router, the LAN side IP of that router.

This is simply creating a routing table with the name of the ISP that will be used and a default route which can find its way to the internet for that ISP.

Next we create the loadbal routing table which is a combination of the previous two:

ip route add default table loadbal nexthop via $PPP_GATEWAY_ADDRESS dev ppp1 nexthop via 192.168.1.254 dev eth0

which is the same idea as we used in the old route caching days, a round-robin route which flicks between the two available routes to the internet.

ip rules

We’ve now created the iptables entries to track and mark traffic from each of the two ISPs and add some basic firewalling and IP masquerading.  We’ve also created a routing table for each ISP and a load-balancing table which splits the traffic between the two ISPs.

Now we need to create some rules to govern which of the routing tables is used for a particular connection.  The commands to do this are:

ip rule add from $PPP_IPADDR table plusnet pref 40000
ip rule add from 192.168.1.253 table talktalk pref 40100
ip rule add fwmark 0x1 table plusnet pref 40200
ip rule add fwmark 0x2 table talktalk pref 40300
ip rule add from 0/0 table loadbal pref 40400

The rules are matched in numerical order based on preference and once a rule matches that’s it.  The first two rules make sure that traffic from the routers uses the correct table.

The important rules are the last three.  Traffic which has been marked “1” will always use the plusnet routing table, traffic marked as “2” will always use the talktalk routing table.  This ensures that all traffic which is part of an on-going conversation will always use the same router out to the internet, and so always come from the same IP address.

The last rule only matches traffic which is not already marked i.e. new conversations.  This routing table, as can be seen in the previous section, has a multi-path route to balance traffic between the two routes out.  Once a conversation is established the IPtables conntrack rules will mark the traffic and so one of the two fwmark rules will match.

Now delete the main default route so that the above rules don’t get bypassed with a route in the “main” table:

ip route del default

And that’s it.  You should now have a router which splits the traffic fairly evenly across two internet connections and keeps tabs on which packets should go out of which routers.  I’ve had this running for a month or so now, and it seems to be working fine.  I’ve had the Pi lock up a couple of times, but I think that’s related to the USB gigabit ethernet adapter.

Smart Netflix hacks

Services such as unblock-us allow you to work around some geographic content blocks by acting as your DNS server and replying with the IP address of, say, the US based Netflix server instead of the UK ones.  I’ve installed dnsmasq on my Pi as well and configured it to use the Unblock DNS servers instead of my ISP or Google servers.  The clients on the LAN get their network configuration over DHCP from the Pi which sets the DNS server address for the clients to the Pi itself which then handles DNS lookups using the Unblock servers upstream.  This works really well for most Netflix clients but I was having a lot of problems getting the Chromecast to work with Netflix and Unblock US.

It turns out that Google have hard-coded it’s own DNS servers into the Chromecast and so your local DNS settings are ignored.  Nice one Google.

Because we’re using a Linux box as our router we can do this:

iptables -t nat -A PREROUTING -s <Netflix Client IP>/32 -d 8.8.8.8 -p udp --dport 53 -j DNAT --to <Alternative DNS Server IP Address>
 iptables -t nat -A PREROUTING -s <Netflix Client IP>/32 -d 8.8.4.4 -p udp --dport 53 -j DNAT --to <Alternative DNS Server IP Address>

Using the NAT table we rewrite the DNS lookup bound for Google’s DNS servers to send it to our dnsmasq server instead. lol.

Spreading interrupts across cores

Network cards have queues for tx and rx.  Higher end cards will typically have more queues, but on the Pi the on-board NIC (which is actually connected via USB) has one for tx and one for rx, as do the VLAN interfaces and the PPP interfaces.  Each of these queues has a CPU affinity and it seems that by default the queues all use the same CPU core.

When downloading an ISO with BitTorrent and the load-balancing set up I was able to achieve just over 10 MBytes a second.  But the Pi became really unresponsive.  Looking at top showed one CPU core maxed out in soft interrupts:

without_queues_spread

 

 

 

 

By adjusting the CPU affinity to spread these IRQs across multiple CPUs I squeeze out a tiny bit more network throughput, but more usefully the Pi remained responsive under heavy load:

with_queues_spread

 

 

 

 

 

The commands I used to do this are:

echo 1 > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo 1 > /sys/class/net/eth0/queues/tx-0/xps_cpus
echo 2 > /sys/class/net/eth1/queues/tx-0/xps_cpus
echo 2 > /sys/class/net/eth1/queues/rx-0/rps_cpus
echo 4 > /sys/class/net/eth1.1000/queues/tx-0/xps_cpus
echo 4 > /sys/class/net/eth1.1000/queues/rx-0/rps_cpus
echo 8 > /sys/class/net/ppp1/queues/tx-0/xps_cpus
echo 8 > /sys/class/net/ppp1/queues/rx-0/rps_cpus

Source

Here’s a tgz file containing my iptables rules and a script to set up the above: routing

Update:  I’ve put the files in this Github repo:  https://github.com/8none1/multipathrouting

If you’re interested in helping to make the scripts a bit more generic and adding fault-tolerance let me know.

Snapping Mosquitto MQTT broker

As part of my ever expanding home automation system I wanted to use MQTT to publish data on my network. With the release of the Raspberry Pi 2 I can run Ubuntu Core to create a reliable, secure and easily updated server which is a perfect fit for requirements of an MQTT broker and general HA controller. I asked some Ubuntu friends to help me package Mosquitto as a Snap, and in return I would write down how we did it. Here’s the story…

Start by reading this: https://developer.ubuntu.com/en/snappy/

In summary; a Snappy application is secure because it’s wrapped with AppArmor. It’s easier to install and upgrade because everything is packaged in a single file and installed to a single location. That location is backed-up before you install a new version, and so if the installation goes wrong you can revert to the previous version easily by copying the original files back (or rather, Snappy will do all of that for you). Simplifying things slightly there are two types of Snappy “application”: Apps and Frameworks. Frameworks can extend the OS and provide a mediation layer to access shared resources. Apps are your more traditional top-level items which can use the provided frameworks, or bundle everything they need in to their Snap. This makes things much easier for app providers because they are now in charge – they can be assured that no library will change underneath them.  This is a huge benefit!

Let’s get Mosquitto snapped.

1. Install QEMU to run an Ubuntu Core machine

http://www.ubuntu.com/cloud/tools/snappy#snappy-local

First we install the KVM hypervisor:

sudo apt-get install qemu-kvm

Then check everything is as it should be with:

kvm-ok

Now download the latest Ubuntu Core image from here: http://cdimage.ubuntu.com/ubuntu-core/preview/  At the time of writing this is the newest x86-64 image: http://cdimage.ubuntu.com/ubuntu-core/preview/ubuntu-core-alpha-02_amd64-virt.img

Then launch the virtual machine. This command port forwards 8022 on your local machine to 22 on the virtual machine, so you can SSH to port 8022 on localhost and actually connect to the Ubuntu Core machine. It gives the Core machine 512MB of RAM, nicely achievable on a modest budget (The Pi2 has 1 GB).  We also forward port 1883 from to the VM, which will allow us to connect to the Mosquitto server on our VM once it’s all installed.

kvm -m 512 -redir :8022::22 -redir :1883::1883 <vm image file>

Once it’s booted you can connect to it with SSH. The username and password are “ubuntu”.

ssh -p 8022 ubuntu@localhost

To make things a bit easier, why not use key authentication? On your host machine:

ssh-copy-id -p 8022 ubuntu@localhost

We should also upgrade our Ubuntu Core VM before we start.  SSH in to your box and run:

sudo snappy update
sudo reboot

2. Build Mosquitto in the right way

Back on your host (not the virtual machine you just created above) create some directories to hold the code and download the latest stable source and the build dependencies for Mosquitto:

sudo apt-get install build-essential cmake
sudo apt-get build-dep mosquitto
mkdir -p mosquitto/install mosquitto/build
cd mosquitto
wget http://mosquitto.org/files/source/mosquitto-1.3.5.tar.gz
tar xvzf mosquitto-1.3.5.tar.gz
cd build

Time to build Mosquitto.  Before you run the commands below, a bit of background information.  The cmake line will force cmake to install the binaries to the location specified with INSTALL_PREFIX, rather than /usr/local.  This is required to bundle all of the binaries and other files to the “install” directory we created above, making it possible to package as a Snappy.

cmake -DCMAKE_INSTALL_PREFIX=`readlink -f ../install/` ../mosquitto-1.3.5
make -j`nproc`

make install

nproc spits out the number of processor cores you have, so the make line above will use as many processor cores as you have available.  It’s not required, and for Mosquitto which is fairly small it’s not worth worrying about, but for a bigger job this is quite handy.

If you look in the “../install” directory you’ll see a familiar structure containing all the goodies needed by Mosquitto.

3. Find the libraries needed and copy them in to your Snappy project

Change in to the install/lib directory and use ldd to display the linked libraries for the two main .so files:

ldd lib/libmosquitto.so.1.3.5 lib/libmosquittopp.so.1.3.5 | grep '=>' | awk '{ print $1 }' | sort | uniq

This uses ldd to show the libraries required by Mosquitto, and then sorts them in to a nice list. You’ll see something like this:

libcares.so.2
libcrypto.so.1.0.0
libc.so.6
libdl.so.2
libgcc_s.so.1
libmosquitto.so.1
libm.so.6
libpthread.so.0
librt.so.1
libssl.so.1.0.0
libstdc++.so.6
linux-vdso.so.1

Now, on the Ubuntu Core machine we can run this little script:

for i in `cat`; do find /lib /usr/lib -name $i; done

Copy the list from the previous command to the clipboard and then paste it in to terminal where this command is running and hit Ctrl-D to submit the list.  The script will then search Ubuntu Core for the libraries required.  If it finds them they will be displayed, if it doesn’t then they are not available in Ubuntu Core by default and will need to be included in your Snappy package.

linux-vdso is the Linux kernel and is available on every Linux system by default, so we don’t need to provide that specifically.

libssl, libcrypto, libpthread, librt, libc and libdl are all available in Ubuntu Core by default – so we don’t need those either.

That leaves just libcares to be copied in to our package.

 cp /usr/lib/x86_64-linux-gnu/libcares.so.2.1.0 .

We should already be in the ‘lib’ directory, hence the ‘.’ above.  We are copying libcares in to the lib directory of our Snap, and when we run the Snap we will pass in the library path to make sure Mosquitto can find it.  More on this later.

4. Add the meta data required for the Snappy package

Reference: https://developer.ubuntu.com/en/snappy/guides/packaging-format-apps/

Create the meta data directory inside the install directory (change to the install directory, it should just be cd ..):

mkdir meta

Create the package.yaml file:

nano meta/package.yaml

And this is what we’re putting in it:

name: mosquitto.willcooke
architecture: amd64
version: 1.3.5
icon:
services:
 - name: mosquitto
 start: ./sbin/mosquitto.sh
ports:
 required: 1883

Information about these fields and what they mean is available in the reference linked to above, but they are easily understandable.  A comment on the name though, you need to append .<yournamespace> where your namespace is as you select in your Ubuntu myapps account.  One thing to mention, you can see that to start our Snap we are calling a shell script.  This allows us to pass in extra options to Mosquitto when it runs.

Next we need to create a readme file:

nano meta/readme.md

This file needs to contain at least a couple of non-blank lines.  Here’s what we put in it:

This is a Snappy package for Mosquitto MQTT broker.
Information about Mosquitto is available here:  http://mosquitto.org/
Information about MQTT is available here: http://mqtt.org/

We also need to configure our Mosquitto server, by editing the conf file.  Most of the settings can be left as default, so we will create a new conf file with only the bits in we need.

mv etc/mosquitto/mosquitto.conf etc/mosquitto/mosquitto.conf.ori
nano etc/mosquitto/mosquitto.conf

Add these two lines:

user root
persistence_location /var/apps/mosquitto/current/

We need to change this to run as root.  Since our Snap will be confined there is no risk here.  I expect the ability to run as non-root users when using Snappy will be improved, but really it’s not necessary.

We also need to add a small shell script to start Mosquitto with the right options.  Create a file in install/sbin called mosquitto.sh:

nano sbin/mosquitto.sh

And add this:

#!/bin/sh
LD_LIBRARY_PATH=./lib:$LD_LIBRARY_PATH exec ./sbin/mosquitto -c etc/mosquitto/mosquitto.conf

We are specifying where to find the extra libraries we require and where to find the conf file.  Make that file executable:

chmod +x sbin/mosquitto.sh

5. Build your Snappy package

Add the Snappy PPA to get the build tools, and then install them:

sudo add-apt-repository ppa:snappy-dev/beta
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt install snappy-tools

In your install directory run:

snappy build .

If you see an error about ImportError: No module named ‘click.repository’ then you likely have a clash between the Click library version in the SDK team PPA and the version in the Snappy PPA.  This will be fixed soon, but in the meantime I would suggest installing ppa-purge via apt-get and then running sudo ppa-purge ppa:ubuntu-sdk-team/ppa.

If you see an error about “expected <block end>” in the package.yaml check the whitespace in the file.  It’s likely a copy and paste error.

6. Install your Snappy package

Once you have your .snap file you can install it to your virtual machine like this:

snappy-remote --url=ssh://localhost:8022 install ./mosquitto_1.3.5_amd64.snap

 

7. Test your Snappy package

If everything has gone to plan Mosquitto should now be running on your virtual machine.  In order to test you’ll need to write a test Publisher and Subscriber.  I used the Python Paho library.

Here’s an example Publisher:

#!/usr/bin/python
import paho.mqtt.client as mqtt
from datetime import datetime
from time import sleep
def send_mqtt(topic, message):
 log("Sending MQTT")
 log("Topic: "+topic)
 log("Message: "+message)
 mqttc.reconnect()
 mqttc.publish(topic, message)
 mqttc.loop() 
 mqttc.disconnect()
print "Time server starting up...."
mqttc = mqtt.Client("python_pub")
mqttc.connect("localhost", 1883)
while True:
 tstr = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
 send_mqtt("/information/time",tstr)
 sleep(10)

And here’s an example Subscriber:

#!/usr/bin/python
import paho.mqtt.client as mqtt
import datetime
def on_connect(client, userdata, rc):
 print "Connected with result code "+str(rc)
 client.subscribe("#")
 
def on_message(client, userdata, msg):
 print "Topic: ", msg.topic+'\nMessage: '+str(msg.payload)
 
 
client = mqtt.Client()
client.on_connect = on_connect
client.on_message = on_message
client.connect("localhost", 1883, 60)
client.loop_forever()

 

What’s next?

We’ve built a Snappy package for amd64 (or whatever your native architecture is), but we really need to be cross-architecture to give people the best choice of platform on which to use the package.  This involves cross compiling, which can be tricky to put it mildly.

I spoke to Alexander Sack, the Director of Ubuntu Core, and asked what was coming next for Snappy and I was very excited to hear about easier cross-compilation methods as well as a cool script to help automate gathering the libraries in to your package.  I’ll find out more about these and follow up with another post about

Special Thanks

A huge “Thank You!” to Saviq and Didrocks for doing the actual work and letting me watch.

 

Where to get Snappy Mosquitto

amd64 version:  https://myapps.developer.ubuntu.com/dev/click-apps/ubuntu/1500/
armhf version: https://myapps.developer.ubuntu.com/dev/click-apps/ubuntu/1502/ (please note, I haven’t been able to test the ARM version because of a lack of hardware.  If it doesn’t work let me know and I can fix it.)

Raspberry Pi powered heating controller (Part 4)

Heating PCB

It’s really happening!  The breadboard prototype has been running the heating and hot water for a week or so now, and so far nothing has caught on fire.  The relays are happy, they’re not getting warm or anything, the Pi is turning things on and off when it’s supposed to, the REST API is working, and I’ve knocked up a quick Web interface to switch things remotely.  Last night for the first time I switched the heating on from the sofa, because I was a little bit cold.

Great success!

Borat

I’ve turned the breadboard layout in to a PCB design and today at 09:11 UTC it was sent off for manufacture, I should have it back in a week.  All the other components are on their way and so next weekend I should be able to put everything together and solder it to the board.  With a bit of luck it will work first time.

I’m quite please with the board layout, it’s a fairly useful breakout board for the Raspberry Pi.  In the future I would like to re-work it with smaller traces so that it can fit directly on top of the Pi.

The software to drive everything is rather basic, but I will make it available via Github anyway, it might come in handy.  I will be working on this over the coming months, so it should improve soon.  You’ll need a MySQL server set up.  The schema for the database is included in the repo.  The whole thing has grown organically and so the naming and structure is poor, but it works.

Code:  https://github.com/8none1/heating

I’ve starting making some 1wire temperature sensors which I will place in various rooms and hook up back to the Pi via cat5.

1wire_heat

I won’t be able to control individual radiators at the moment, but I can set a target temperature for a given room and rely on the TRVs to control the temperature in other rooms.

I’ll also be adding outside temperature sensors and looking to replace the main room stat with another Pi in the future.

In summary then, I should have the final thing built in the next couple of weeks.  More to follow when that happens.  I’m considering looking at moving the whole thing to a micro controller rather than a Pi, that would reduce the BOM quite considerably, and might even warrant a commercial product in the long run – a hackable heating controller.  Is this something people might be interested in?

Further Updates

 

Raspberry Pi powered heating controller (Part 3)

I’ve had all the parts hooked up on breadboard for a few weeks now, and in theory everything works.  I haven’t actually tested the prototype with the real heating system yet as it’s been cold and I don’t want to risk blowing anything up and having to deal with a cold house and an angry family.  Sometime in the next month or so I will do that, but as far as I can tell it will Just Work (lolz).

The final circuit design

The final circuit design

In the meantime I’ve been thinking about the software to drive the thing.

My programming skills are pretty basic so I have been trying to keep everything as simple as possible.   It became apparent that I was going to need some inter-process communication so that I could handle switching things on and off from outside the controller program (for web control etc).  This presented me with a problem as all the basic reading from files/FIFOs were blocking – in that they would sit and wait for a “command” to be received before continuing – and that was too limiting for my needs.  I was going to have to deal with some kind of threading.  This troubled me deeply.  The good people of Google+ suggested a few options and in the end I decided a RESTful interface was the way to go as it should be accessible from the widest range of other languages I might have to use, and especially easy from a web browser.

Originally I had imagined that I would have a single Python script to handle pretty much everything, including:

  • Switching the relays on/off
  • Responding to button presses
  • Proving a scheduling engine for standard “on at this time, off at this time” behaviours
  • Providing a way for other things to control the system
  • Monitoring temperatures around the house
  • Intelligent switching (e.g. it’s extra cold this morning, turn on early)

That way, I figured, I could just write one script and each function could interact with each other function very easily.  But following a conversation with Mark S. a few years ago, and then more recently with Stuart L. I was starting to think that the monolithic architecture was not the way to go and instead I should push the intelligence out to the edges and the main loop should just deal with the basic on/off functionality.  I was worried that if something went wrong with the electronics, for example the heating turned on but never off, then it would be difficult to spot from outside the main loop.  But, given that the heating system has a lot of built-in safety features like room stats, boiler stats and hot water stats which are all wired in series with the mains that won’t be a problem, and besides the current controller doesn’t do anything more intelligent than ON or OFF.

To that end I decided that I’d go with this architecture:

Heating System Architecture

The Relay Controller would handle switching the power on and off in the right order to the right outputs, it would handle the switches for manual override, provide feedback via LEDs, and it would provide a REST interface for anything else to control the system or find out what the current state is.  This way I can write a much more simple script to manage the scheduler for example.  It can run independently and then simply poke the relay controller at the right time.  I also decided that the relay controller would only have the ability to switch off after a certain period of time.  The default will be 60 mins.  This further simplifies things and puts just the right amount of intelligence in the core.  The scheduler just says “switch on now, and off again in 90 minutes”.  If the scheduler crashes the correct state is maintained by the controller and things just pick up from where they were when the scheduler starts again.  If the controller crashes then the power is removed from the system and it “fails safe”.

The REST API is simple:

  • /get/ch or hw/ – gets the current state of the system as a JSON object. e.g. “http://10.0.0.1/get/ch”
  • /set/ch or hw/on or off – switch the system on or off.  e.g. “http://10.0.0.1/set/ch/on” switches the central eating on for the default time (60 minutes)
  • /set/ch or hw/on/n – switch the system on for n minutes.  e.g. “http://10.0.0.1/set/hw/on/90” switches hot water on for 90 minutes

I used the Python SimpleHTTPServer, the ThreadedHTTPServer and threading.Thread.

I’m certain that it could be rewritten using proper OO methods and whatnot, and there is quite a lot of code which started out as an idea which then became redundant, but if you’re interested it’s available here:  piheat

Next on the agenda is to draw a PCB and get it made (once I’ve tested the circuit properly) and work on the scheduling engine.  At this rate I should be finished right around the time when we don’t need to use the heating any more.

Further Updates

 

Raspberry Pi powered heating controller (Part 1)

In which no Raspberry Pi’s are seen.

TL;DR:  It should be fairly straight forward to add a Raspberry Pi controlled heating and hot water system to a standard UK domestic set up and, more importantly, remove it again without messing with the existing set up.  As a minimum you’ll need a Raspberry Pi and 4 relays.  A few other bits and bobs wouldn’t go a miss though.  The theory checks out, I’ve ordered the bits, come back next time to see what it looks like.

It occurs to me that – for a long time we’ve had a thermostat in our homes which switches the heating off when it gets warm enough, but wouldn’t it be just as useful to have something which turns the heating on when it gets too cold?

This thought, together with a Raspberry Pi that wasn’t doing much and a strong desire to make my home more connected, led me to think about how I might control my heating system from, say, a smart phone.  I’m far from being the first person to think of this idea, and there are loads of really good examples out there, but none of them did quite what I wanted in the way I wanted to do it.  So I’m going to start from first principals and walk through this project design to try and build a removable & non-destructive add-on to an existing system.  I’m writing this at the very start of the project, so I’ve no idea if it will work, if I will break some expensive components on way to getting it working, or if I will just give up before I get to the end.  Let’s see.

Typical domestic hot water and heating systems

This may be UK specific.  Here is a very very crude diagram of a typical home set up:

A crude diagram of how the central heating system works in a typical UK home.

A crude diagram of how the central heating system works in a typical UK home.

The boiler burns gas and heats water.  That hot water is circulated around the system (called the primary circuit) by a pump and can do three jobs.  It can circulate through a heating element in a hot water cylinder and heat more water which is stored in the cylinder.  Note that the water which circulates through the element does not come in to contact with the actual water it is heating, the two are kept separate for water quality reasons.  The second job it can do is circulate through radiators in the home and hear the air.  The third job is to do both.  The hot water from the boiler moves around the primary circuit losing it’s heat to either the hot water in the cylinder or the air and eventually passes through the boiler again, heats up, and goes round and round again.

There are two “header” tanks of cold water in the loft.  One is for the cold water to the bathroom for flushing the loo, filling the bath, brushing your teeth, that kind of thing.  This tank also fills the hot water cylinder.  The other is the header tank for the primary system and ensures that it can’t boil dry.  Both use gravity and water pressure to make sure the water flows to where it is needed.

The system in the diagram is an “open” system.  If the hot water in the cylinder gets too hot it can expand up the vent pipe and dump itself in to the cold water tank.  The cold water tank can over flow to outside.  If the hot water in the primary gets too hot it can expand up in to the header tank ready to be reused to fill the primary when the water cools.

There is such a thing as a sealed pressurised system which doesn’t have these vents.  These are more complex and if you have one please be very careful in tinkering with the control mechanisms.  In an open system, if you get things wrong and the boiler runs and runs you would end up with a lot of steam in the loft.   In a pressurised system things can go pop and blast you with boiling water.  That said, in an open system you could still end up dumping a header tank full of boiling water down on to the bedrooms below.  People have died from this happening, so tinkering with the heating system is not something to be taken lightly.

In summary then; we have three things we can ask a system for:

  1. Make hot water
  2. Heat the house
  3. Make hot water and heat the house

And we have a number of key elements:

  1. Boiler
  2. Hot water cylinder
  3. Primary header tank
  4. Cold water tank

Typical electrical system to control hot water and heating

Once we understand how the wet bits fit together we can take a look at the electrical components:

Y Plan electrical wiring plan for central heating and hot water.

Y Plan electrical wiring plan for central heating and hot water.

There are multiple “standards” for wiring up a heating system.  You can find heaps of information on the excellent DIY FAQ wiki.

My system has been wired in the “Y Plan” configuration and if you have a single 3-port valve in your airing cupboard and a couple of tanks in your loft – then there is a good chance you have too.  I will run through the wiring, and some of the inherent safety systems built in (which is why I’m keen to make sure my controller is a simple replacement for the existing controller, and is not a complete re-wire).  Before we start though, a further word of caution.  Mains electricity is lethal.  You need be comfortable playing with this stuff to consider attempting anything to do with the heating system.  It’s also probably illegal in UK due to some draconian restrictions on what a home owner can and can not do to the wiring in their own home.  Don’t try this at home kids.  A competent tradesman might be able to help you hook it all together.

The incoming mains supply goes through a double pole switch which will disconnect live and neutral when switched off.  In this diagram, the live feed provides power to only the controller (sometimes you might see a parallel (switched and fused) connection to the boiler from that live).  So first and foremost, all power to the components comes through the controller. Neutral is common to pump, boiler and valve and so is earth.

Thermostats are placed in series for both the hot water circuits and the heating circuits.  These will physically break the circuit when a specific temperature is reached.

Let’s consider this example:  I tell the controller to heat the water.  It connects the live feed to the “HW ON” cable via point 6 on the diagram. The current flows to the cylinder stat, which allows the current through since the temperature is lower than the trigger point it is set to.  The pump and the boiler are connected in parallel so you can’t run the boiler with out the pump running too (at least that’s the plan), and they are provided power via the room stat to point 8 on the diagram.  The boiler is told to turn on, and the pump moves that heated water around the system.  The hot water reaches the three port valve.  The valve has an electrical actuator on which moves to set position depending on what electrical connections are made to it.  In our case, no INPUT power is being applied to the valve, so it sits in it’s default position – which just happens to be “Hot water mode”, and so the heated water from the boiler passes through the hot water cylinder only.  When the hot water cylinder get’s to the right temperature the thermostats clicks over to the other contactor and now no power is applied to the pump and boiler via the HW ON output on the controller.  Instead, the grey wire, point 7 on the connector, is energised.  This tells the valve that HW is no longer required.  This system is pretty safe, since as soon as the cylinder stat is triggered power is removed from the boiler and so it would shut down.  Now, thermostats do fail, but they usually “fail safe”, but sometimes they don’t.

What if I want just the heating to run?  The controller connects to the live input to the CH cable via point 4 on the connector.  This passes through the room stat which will allow the current to flow if it’s below the temperature set.  The current ends up at the valve via point 5 on the connector.  In this case, where we only want heating, the “white” wire is live (it’s black on the diagram) and the valve connects the “white” wire to the “orange” wire which goes back to point 8 on the connector, and in turn provides power to the boiler and pump.  At this point the “grey” wire is also energised, as the controller makes it’s “HW OFF” output live when you ask for only heating. The room stat is able to cut power to the circuit when it reaches the set temperature.

If we want both hot water and heating, the controller energises the “CH ON” and “HW ON” outputs.  Here current is provided to the pump and boiler when any of the thermostats indicates that more heating is required.  If the HW reaches it’s temperature first, then the stat energises the grey wire, which tells the valve that no more hot water is required, and so it will move to the CH ONLY position, and current will continue to be provided by the orange wire when the valve reaches the correct position.  The heated water from the boiler will stop circulating through the hot water cylinder and go only through the radiators – concentrating the heating to where it is needed.  Pretty neat!

This system strikes me as being both simple and brilliant at the same time.  It’s also pretty safe, as long as the stats are working as they should do.

In summary then, the controller is able to indicate a requirement for hot water, central heating, or both by linking three outputs to live in the right sequence.  The four states are therefore:

  1. HW OFF, CH OFF (0,0)
  2. HW OFF, CH ON (0,1)
  3. HW ON, CH OFF (1,0)
  4. HW ON, CH ON (1,1)

Confirming my deductions

I’ve looked at the plumbing, and I’ve looked at the wiring, and I’m pretty sure that I know what’s going on.  Next thing to do is apply the scientific method and gather the evidence to back up my assumptions.

Behind the heating controller

The first thing I did was to turn off the power to the heating system.  I’m paranoid, so I turned it off at the fused connection to the left of the controller and also at the fuse box.  I also wore rubber boots, and jumped in the air every time I touched a wire.  Better safe than sorry, eh?  And, rightly so it turns out.  The fused connection unit did actually cut all the power to the heating system, but look carefully at the third connection from the left and you’ll see an earth wire being used to carry live current.  This is against all the regulations.  Whoever installed this system originally was clearly a free spirit.  I was also quite impressed that they’d managed to squeeze all the connections in to a double gang back box.  What a mess.  Remember kids, only a competent person is allowed to fiddle with these things – they do a better quality job you see.

Looking at the zoomed in image you can see 6 terminals:  N, L 1, 2, 3, 4.

N & L are self explanatory.  2 is not connected to anything, and so I don’t need to worry about it.  So that leaves three connections that do something (1, 3 and 4).  Exactly what I was expecting.  One will be CH ON, one HW ON, and one HW off.  Which is which?

Looking at the back of the controller unit it’s self:

Rear of heating controller

My theory is sound!  1 is HW OFF, 3 is HW ON, 4 is CH ON.

It looks like everything is connected as I had expected, but better safe than sorry.  Let’s do a bit more testing:

Testing harness

I wired in a few bits of cable and then (not shown) removed the connections to the rest of the system (labelling where they came from when I removed them!).  I left the L & N connected.  To recap, I removed the existing wires from 1, 3 and 4 and replaced them with my cables which came down to some screw down connector blocks.  The reason I put connector blocks on the end was two fold.  Firstly, to make it easier to probe with my multimeter and secondly to stop me accidentally brushing against one of the cables and giving myself a shock.  I also labelled the permanent live with a bit of red heat-shrink, just so I don’t get confused.

Test harness 2

Putting the controlled back on, I hooked up my multimeter and switched through the options to see what happens when.  My findings are below:

.

.

.

.

.

Exactly what I expected.  Point 1 must, therefore be “HW OFF”, point 2 “HW ON” and point 3 “CH ON” – which they are, as we saw from the back of the controller.  I’m now confident enough with the set up to proceed with roughing out a block diagram for the controller and ordering the parts.

The plan

heating_controller_block  Heating Controller Crude

This rather unclear breadboard layout (with the awesome http://fritzing.org/home/) logically lays out what I intend to do.  I’ve also added a crude block diagram for good measure.

First, I will add a real-time clock module.  They’re cheap and easy to fit.  This will provide the Pi with a source of time when it can’t talk to NTP servers, and so it will be able to turn things on and off at the right times, even when the network connection is down.

Next I will add four relays.  I will take the main 240V incoming supply out of the existing controller and put it through relay 1.  This relay will pass the supply on to the existing controller via the “Normally Closed” relay output.  When I switch this relay, the supply to the existing controller will be dropped, and instead routed to the other three relays which will then be able to switch this current.  These three relays will be wired in parallel with the existing controller connections, much like in the image above showing the test harness connected in to the controller connections.  That is to say: one relay will go to point 1, one to point 3 and one to point 4.  The existing safety features (thermostats in series in the circuits) are un-changed and so still offer the same protection.  In order to activate heating or hot water we switch the relays as per the table above.  By adding my new system in parallel and being able to easily switch between the two I can bring the RasPi powered one online gradually.  A few hours here, a few hours there.  And once I’m happy that it’s not going to go crazy I can leave it unsupervised for longer and longer periods.  It also means that if I update the software and break something, we can still wash.

I will also add a number of 1wire temperature sensors.  I will have three on the hot water cylinder: 1 at each of the top, middle and bottom.  This will give me insight in to how much hot water is in the cylinder, and the temperature thereof.  This is not intended to be a safety system.  The temperature readings from these sensors will not be relied upon to switch things off in an emergency, that will be left to the original thermostats, but – we could use these readings as well to help make decisions.  I will also fit a temperature sensors in the cold water tank, the primary header tank and somewhere outside.  This will give me insight in to a couple of things:  Firstly, how cold is the water in the CW tank, and what is the temperature outside?  This has a direct effect on the number of showers that can be had from a given amount of hot water at a known temperature. Useful for trending too.  Secondly, fitting a sensor in the primary header tank can report when the header tank is getting hot.  Really, the header shouldn’t heat up too much.  If it does then either the system is “pumping over” – where the pump is forcing water up the vent pipe pipe OR the water is so hot it has expanded enough to push water out of the vent, or I expect some combination of the two.  Either situation is sub-optimal, and with a sensor in the header tank I get some visibility of what’s going on.  I might also add a sensor to the boiler input and output, so get an idea of how much work the boiler is doing.  Adding a sensor to each room would be a nice addition at some point too.

A couple of switches will be added for manually switching the hot water or heating on/off from the airing cupboard, where the current controller is situated and where the more senior visitors to Whizzy Towers will expect the heating buttons to be.

Shopping list:

Couple that with a few odd bits of wire, some LEDs a bit of Python and we should have ourselves a Raspberry Pi powered heating and hot water controller which is relatively safe, easy to remove and cheap to build.  Let’s see what happens when all the bits turn up.  Should be here in a week or so.

Stay tuned.

 

Further Updates

  1. Part 2
  2. Part 3
  3. Part 4