Transcending the Stack with the Right Network Protocol

The increase in network-connected devices the past years has been something of a dual-edged sword. While on one hand it’s really nice to have an easy and straight-forward method to have devices talk with each other, this also comes with a whole host of complications, mostly related to reliability and security.

With WiFi, integrating new devices into the network is much trickier than with Ethernet or CAN, and security (e.g. WPA and TLS) isn’t optional any more, because physical access to the network fabric can no longer be restricted. Add to this reliability issues due to interference from nearby competing WiFi networks and other sources of electromagnetic noise, and things get fairly complicated already before considering which top-layer communication protocol one should use.

In this article we’ll be looking at implementing such a network-based system, securing a WiFi network with TLS, and the use of MQTT in combination with a proxy. I’ll illustrate this using experiences and lessons learned while working on this Building Management and Control (BMaC) project that I covered in a previous article.

Getting MQTT into your system

Message Queuing Telemetry Transport (MQTT) is a small, binary protocol that was developed by Andy Stanford-Clark of IBM and Arlen Nipper of Cirrus Link in 1999. Its version 3.1 was submitted in 2013 IBM to OASIS for standardization. Another version of MQTT is MQTT-SN, which is designed for lower-bandwidth, non-TCP networks, such as Zigbee, UDP and Bluetooth.

Due to its compact size and simple, client-server architecture, it is highly suitable for connecting larger and smaller networks of sensors, especially in high-latency, low bandwidth situations. It uses a subscribe-publish model, where clients can subscribe to topics on which others can publish messages. These messages can be persistent, have guaranteed delivery and (as of version 5) can automatically expire if they cannot be delivered.

The obvious advantage of MQTT is that it supports everything from always-online, high-bandwidth clients, to low-powered, remote sensor nodes which just wake up every week, dial into a satellite link and send some sensor readings while updating their calibration settings from data which they receive at the same time from some other client.

The use of MQTT (with the Mosquitto MQTT broker) in the BMaC project was initially more of a coincidence, with us using it mostly because an MQTT client was already integrated into the framework we were using on the microcontrollers. None of us had really thought about the advantages or disadvantages of MQTT over any alternatives. Now, years later, it’s easy to see why MQTT was the right choice. While running it on an internal, TCP-based network, we got the guaranteed delivery aspect of TCP along with its built-in checksum verification, with the MQTT protocol itself putting no constraints on the payload it can carry, whether it be text-based or binary.

Real competition for MQTT does not really exist. AMQP is also fairly popular, but it targets desktop and server systems in an enterprise setting, and doesn’t really scale down to RAM-constrained 8-bit microcontrollers. Further, AMQP also defines an encoding scheme for the payload, whereas MQTT leaves one free to use whichever encoding or serialization scheme one wishes to use.

With BMaC we could thus develop our own payload format that would be sent to and from the ESP8266-based nodes. This resulted in a compact, binary format using just a few bytes at most that sufficed to configure nodes over MQTT as well as adjust the fan and relay settings.

Securing the System

The best way to secure a system is through the practice of security in depth. That means that every part that could be exploited should be secured in some fashion. Assuming a system like that of BMaC, this means that the physical hardware is all inside an office building, which has its own security system installed.

This security system can be simple mechanical locks, or some NFC tag-based system. Sensitive areas like server rooms require their own access keys or permissions associated with the NFC tags. This practically eliminates any risk of unauthorized individuals gaining access to the hardware, let alone perform any nefarious actions.

Wireless networks for a system like this are of course secured by WPA2 or similar, meaning that without the right password or certificate, one cannot connect to the wireless network. Any traffic on the network will consequently be encrypted. This shifts the most likely threat to those who somehow have gained access to the network, whether through legal means, or because the WiFi SSID and password were in a photograph that got published on the public company blog (true story).

At this point we have an okay level of security, but the missing ingredient is to secure the traffic between the nodes and the backend servers, meaning either TLS encryption (very common), or Elliptic Curve Cryptography (ECC), which would be the superior choice because it’s faster, requires significantly less RAM, and has much smaller certificates. Unfortunately ECC has taken the backseat to TLS, mostly on account of it being patent-encumbered for much longer.

This made TLS the easier type to integrate into the BMaC project, as adding ECC would have meant ditching the axTLS library in the framework which we were using for the ESP8266 nodes and integrating an alternate library that supports ECC and also fits in the limited RAM provided by this microcontroller.

The Part Where Things go Boom

We quickly found out that the default handshake setting in TLS encryption for TCP connections causes massive problems for an ESP8266 and similar MCUs which tend to have less than 30 kB of SRAM available when this handshake event occurs.

The default TLS configuration dictates namely that the maximum TX/RX buffer sizes are allocated when a secure connection is attempted, being 16 kB each, or 32 kB in total. With non-trivial firmware this results in the MCU running out of memory and the MCU resetting. Fortunately this setting can be changed on the side of the server, as noted in this article on TLS. This would allow the server to set the TLS buffer size to something that would fit in the MCU’s SRAM.

Sadly for BMaC, the server on the Mosquitto MQTT broker didn’t have this as a configuration setting, requiring us to change it in the source code and recompile the server. That seemed a bit of overkill.

Instead we opted to add a different TLS endpoint to the system, using HAProxy as an intermediate. We configured an interface with TLS-only access that simply routes any decrypted data to Mosquitto via the localhost loopback interface, and set the tune.ssl.maxrecord property to 2 kB, for 4 kB of buffer space on the ESP8266. After enabling both server and client certificates on the HAProxy and BMaC node firmware respectively, we had a TLS-encrypted connection up and running, ensuring that not even our colleagues could sniff on what we were doing.

Putting it Together

By the time we had finished wiring up the first controller for the air conditioning system at the office, the BMaC project consisted out of a wireless network of motion, temperature, CO2, air pressure and coffee usage sensors, along with a bunch of relays and fan controllers, all tied together using a central backend server and secure MQTT connections..

After getting the network set up, with MQTT secured using client-side certificates to make sure that only genuine BMaC MQTT clients could connect, it was very nice to be able to focus on getting the commands and data transferred between the nodes and the backend. The only issue that really annoyed me there was the lack of an MQTT desktop client that would allow me to do MQTT monitoring, active topic discovery and be directly compatible with binary payloads instead of assuming that one would only ever use MQTT for text-based payloads.

This led to me developing a C++/Qt-based MQTT desktop client called MQTTCute. It’s the client I wish I would have had right from the beginning as I was setting up the whole system, trying to get an idea of what was being sent around on the MQTT topics. Since we ended up using a binary protocol for BMaC, having a built-in hex view function in the desktop client would have been invaluable.

Regardless, if we had to do it all again, with the knowledge we gained, we would pretty much still have picked the same route. Likely we would try to use ECC instead of TLS, however, just to save ourselves the overhead of using an additional TLS endpoint and proxy server.

We also found that a number of MQTT libraries assumed text-based payloads, and would use C functions like strlen() and kin. Many of them have since received pull requests from yours truly so that those libraries now happily accept any kind of binary data one wishes to send via MQTT, including images.

The Elephant in the Room

When it comes to MQTT and similar client-broker systems, there’s always the argument that they cannot be reliable because they have a single point of failure in the form of the MQTT broker. This is definitely a valid point, but also not nearly as valid as one might assume.

MQTT brokers tend to run on reliable server hardware, in the case of BMaC as a Linux virtual machine instance on a storage cluster. For the broker to suddenly vanish off the network would require the kind of catastrophic failure that’d cripple the company’s network along with it.

One could conceivably set up a second, fall-over MQTT broker on a secondary address, but that would be a lot of work without good cause. In our own year-long BMaC development process, we had zero failures of the Mosquitto broker and more issues with glitches in the (old) WiFi access points.

14 thoughts on “Transcending the Stack with the Right Network Protocol

  1. I recommend anyone looking to secure MQTT traffic on an ESP32 or 8266 look at PreShared Keys. They are more straightforward to implement, and are probably good enough for what we do. If you aren’t generating a certificate specifically for each device with some validatable information (like the IP address of the MQTT host and clients), you really arn’t getting any additional security with PKI certificates. If you are using the same client cert everywhere, you just have a bunch of key material on every device that could be downloaded and reused on a nefarious device, which is basically what you get with a preshared key.

  2. You’re calling security in width, security in depth. They are very much NOT the same thing. Covering each piece is width, having a backup on each thing is depth – like having a burglar alarm that brings the cops, but also having a safe to slow down the bad guys till the cops can get there – that’s what depth is (shallow, but at all). Just covering an attack surface one layer deep isn’t depth at all, and very misleading to those who don’t know this security stuff already.

    1. Thanks for the article, nice he doesn’t go on a tantrum and state mqtt is useless just points out it’s flaws. And I agree with him.

      One thing I’d like to note, that a lesser level of security to consider for low end devices, if the data isn’t secret but a level of trust is needed, consider signing the data. I know there’s an inexpensive chip that can handle that. I have a few, just haven’t gotten to them yet (round tuit).

    2. I’m one of those described in the article as having MQTT bedsheets. Or at least an MQTT pillow case.

      Folks who are used to all-encompasing protocols complain that MQTT is missing things, like standardized payloads and discovery mechanisms. And that’s true — these are details left up to you to implement. If one sensor reports in Celsius and another in Fahrenheit, that’s on you.

      The T is MQTT stands for “transport”, and it’s really about getting the data from here to there. The linked article seems to think it should be doing a lot more.

      You are left to design your own protocol, interpret the data, and build in as much security as you need. Is that good or bad? Probably both, like you say.

    3. I saw lots of posts like that while I was evaluating MQTT, but ultimately those are the reasons I chose it.

      I’ll worry about interoperability myself, at the application layer. It will need to vary anyways, from a small ARM system to a desktop client.

      What I get from MQTT is that I can route the data transparently. Maybe it is point to point at first, but later I want to have the data on a feed. I don’t necessarily have to change the client at all to make that happen.

      Generally, then I use Protocol Buffers (nanopb) to describe the data. For many use cases something like AVRO might be better for that layer, since it is self-describing.

      When the alternative is to roll my own everything, it hardly seems fair to complain that the tool doesn’t do everything. It does some of the parts I don’t want to have to implement repeatedly.

  3. Comparing TLS with ECC is like comparing apples to …. apple seeds, I suppose. ECC is one of the asymmetric ciphers available within TLS. Perhaps you’re thinking of NaCl or one of the crypto libraries smaller and more focused than TLS? Better to use one of the low-resource cipher suites made specifically for embedded applications, perhaps; or as [Kevin Kessler] suggests, PSK.

  4. ‘transcending’ MQTT would be to leave the protocol. There’s no sense the large overhead. If you really want security or optimization, you would probably want to make your own using something like ZeroMQ as the message handler.

    http://zeromq.org/

    Or, maybe an already established protocol like DDS (it it’ll fit).

  5. I searched for quite a while for a protocol that exactly fit my application, and finally I just decided to do my own. I needed good enough security(As in “No pro cryptographer has time to mess with this, but script kiddies need to stay out”), remote procedure calls, and reliable messaging. I also needed ESP8266/32 support from within the IDE, and I wanted multicasting.

    But most importantly, I wanted the whole thing to pretend that it’s connectionless. The concept of a broken connection shouldn’t be exposed to the user when it’s meant to run on crappy WiFi. If a device is unreachable, RPC calls will fail, but if the device comes back, the protocol should reconnect.

    It was one of the more interesting projects I’ve done. NaCl was the obvious choice for security, and UDP was obvious for transport, but designing APIs I was happy with and getting everything to work reliably took a long time.

  6. If you use MQTT you might want to take a peek at https://mqtt-explorer.com, it is a great MQTT Client.
    – It gives you an easy start when you are new to MQTT
    – provides you a great overview of all topics
    – shows you your existing and legacy architecture (things you totally forgot about)

    Great article and I’m really loving the binary features of MQTTCute

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.