Self-Hosted Voice Control (for Paranoids) | Joshua Boniface, sysadmin

Table of Contents

Building a self-hosted voice interface for HomeAssistant

Voice control is both a new, and quite old, piece of the home automation puzzle. As far back as the 1960’s, science fiction depicted seamless voice control of computers, culminating in, to me, one of Star Trek’s most endearing lines: “Computer, lights”, followed by the satisfying brightness of hands-free lighting!

In the last few years, real-life technology has finally progressed to the point that this is truly possible. While there have been many attempts over the years, the fact is that reliable voice recognition requires massive quantites of computing power, machine learning, and sample data. It’s something that truly requires “the cloud” to be workable. But with the rise of Google and Amazon voice appliances, the privacy implications of this have come into play. As a now-widely-circulated comic puts it, 30 years ago people were concerned about police wiretaps - now, they say “Wiretap, order me some duct tape”! And this is compounded by the proprietary nature of these appliances. Sure, the company may say that they don’t listen to you all the time, but without visibility into the hardware and software, how much can we really trust them?

Luckily, the free software community has a couple of answers. And today, it’s possible to build your own appliance! It still uses the Google/Amazon/Microsoft speech-to-text facilities, but by controlling the hardware and software, you can be sure that the device is only listening to you when you tell it to! Hopefully one day projects like Sphinx and Kaldi will be up to the task, but for now we’re stuck using the cloud players, for better or worse.

Hardware - The Raspberry Pi and ReSpeaker

The Raspberry Pi has pretty much become the go-to device for building small self-hosted appliance solutions. From wildlife cameras to a server BMC, the Raspberry Pi provides a fantastic base system for just about any small computing project you could want to build. This project makes use of the Raspberry Pi 3 model B, mainly because it’s the most commonly available new, and due to the computing requirements of the software we will be using - the original Raspberry Pi doesn’t have enough computing power, and the Raspberry Pi 2 has some software consistency issues.

The second main component of this project is the Seeed Studio ReSpeaker (4-mic version). The ReSpeaker provides an array of 4 microphones, one on each corner of the square board, in addition to a ring of LEDs, giving a visual appearance similar to the Google and Amazon appliances. By integrating tightly with the Raspberry Pi, you can build a very compact unit that can be placed almost anywhere and with only a single incoming cord for power, assuming WiFi is in use.

Parts list

1x Raspberry Pi 3 (or newer)
1x SD Card for Raspberry Pi (8+ GB)
1x Power cord for Raspberry Pi
1x ReSpeaker 4-mic hat

Assembly

Assembly of the unit is very straightfoward. The ReSpeaker attaches to the main Raspberry Pi GPIO pins, and sits above the board as seen in the picture on their site above. Once this is attached, the Raspberry Pi is ready to be installed and configured for it.

Software - Kalliope, ReSpeaker, and Raspbian

To start, this post doesn’t document my HomeAssistant configuration - to do so would require its own post entirely! What is important for our purposes though is that my HomeAssistant interface is exposing multiple API endpoints, one for each room, that handle the various lighting events that happen there. You can use this method for communicating almost anything to HomeAssistant via voice control.

For example, the following endpoint + data combination triggers a “lights-on” event for my bedroom:

curl -H 'X-HA-Access: MySuperSecretPassword' -H 'Content-Type: application/json' -X POST -d '{ "state": "on" }' https://myhomeassistantdomain.net:8123/api/events/bedroomlights

With the HomeAssistant side set up, we can begin configuring the Raspberry Pi.

Kalliope

Kalliope is a free software (MIT-licensed) project to provide an always-on voice assistant. It is written in Python and features a very modular structure and extremely flexible configuration options. Unlike commercial options, though, you can inspect the code and confirm that it indeed does not report everything you say to some Internet service. Using the Snowboy library to provide and wake to a trigger word, you can then customize its behaviour based on the phrase recieved from your choice of speech-to-text provider (Google, Amazon, etc.). And since Snowboy is a local service, it is only sending data to the cloud once it’s awoken by the trigger word.

I start with the official Kalliope image. The reason for this is twofold: first, the image provides a conveniently-configured system without having to manually pip install Kalliope, which even on a Raspberry Pi 3 takes upwards of an hour. Second, and most importantly, Snowboy appears to be broken with the latest Raspbian releases; it is impossible to properly compile it, and hence the pip install can fail in obscure ways, usually after you’ve already been compiling it for an hour. Using their pre-built image, and then upgrading it to the latest Raspbian, bypasses both problems and let’s you get right to work.

Once you’ve written the Kalliope image to your SD card, boot it up, and then perform an upgrade to Raspbian Stretch (the image is Jessie):

pi@kalliope:~$ sudo find /etc/apt -name "*.list" -exec sed -i 's/jessie/stretch/g' {} \;
pi@kalliope:~$ sudo apt update && sudo apt upgrade -y
pi@kalliope:~$ sudo reboot

Once this finishes, you’ll be booted into your Raspbian Stretch system complete with Kalliope installed. I cover the configuration in a later section.

ReSpeaker Audio

The ReSpeaker library provides the drivers and utilities for using the ReSpeaker hat with Raspbian. Note however that this library won’t work on Raspbian Jessie, only Stretch, which is why we have to upgrade the Kalliope image first. Once the upgrade is finished, clone this repository into a local directory and follow the instructions provided. Verify that the driver is working by checking arecord -L and looking for ReSpeaker entries, then configure the volume of the microphones using alsamixer. I find that a gain of 90 with a volume of 75 makes a fantastic value, since 100/100 results in nothing but noise. Your mileage here may vary, so do some test recordings and verify as recommended in the library README.

One downside is, however, that the ReSpeaker technically supports directional audio (like, e.g. the Alexa, using the mic closest to you for optimal performance). At the moment though I don’t have this support in this project, because I’m making use of PulseAudio to handle the incoming audio, rather than directly interfacing with the ReSpeaker unit - this support would have to be built into Kalliope. It does work, but you don’t get the directional listening that you might expect from reading the ReSpeaker page!

ReSpeaker LEDs

The LED portion of the ReSpeaker requires a little more work. The examples library for the 4-mic hat provides all the basic tools needed to get the LEDs working, including several samples based on Google and Amazon device patterns. In my case, I went for a very simple LED feedback design: the LEDs turn on blue while listening, then quickly turn either green on a successful command, or red on a failed command, giving some sort of user feedback without having to listen to the unit try and talk!

To do this, I created a simple Python “daemon” running under Systemd to listen for commands on a FIFO pipe and perform the required action, as well as a helper client utility to trigger the pipe. The code for these can be found on my GitHub for convenience. One interesting feature of this configuration is the Systemd unit file. It performs a git pull inside the service directory (e.g. the repo directory) to ensure the service is automatically up-to-date when the service is started. I do the same thing in my Kalliope unit file for its configuration.

Kalliope configuration

The next step is to actually configure Kalliope. The examples are a good starting point, but integrating everything together is a bit more work. Below is a sample of the brain.yml configuration for my instance, showing how it integrates the ReSpeaker LEDs directly, as well as posting to the HomeAssistant URL.

# Default/built-in orders
  - name: "order-not-found-synapse"
    signals: []
    neurons:
      - shell:
          cmd: /usr/bin/env python /srv/respeaker-led/trigger.py leds_red
      - shell:
          cmd: /bin/sleep 0.5
      - shell:
          cmd: /usr/bin/env python /srv/respeaker-led/trigger.py leds_off
      - shell:
          cmd: /bin/sleep 0.2

  - name: "on-triggered-synapse"
    signals: []
    neurons:
      - shell:
          cmd: /usr/bin/env python /srv/respeaker-led/trigger.py leds_blue

  - name: "on-start-synapse"
    signals: []
    neurons:
      - shell:
          cmd: /usr/bin/env python /srv/respeaker-led/trigger.py leds_off
      - shell:
          cmd: /bin/sleep 0.1

# Custom orders
  - name: "order-lights-on"
    signals:
      - order:
          text: "lights"
          matching-type: "normal"
      - order:
          text: "lights on"
          matching-type: "normal"
      - order:
          text: "turn on lights"
          matching-type: "normal"
      - order:
          text: "full brightness"
          matching-type: "normal"
      - order:
          text: "all lights on"
          matching-type: "normal"
    neurons:
      - shell:
          cmd: /usr/bin/env python /srv/respeaker-led/trigger.py leds_green
      - uri:
          url: "https://myhomeassistantdomain.net:8123/api/events/bedroomlights"
          headers:
            x-ha-access: MySuperSecretPassword
            Content-Type: application/json
          method: POST
          data: "{ \"state\": \"on\" }"
      - shell:
          cmd: /bin/sleep 0.4
      - shell:
          cmd: /usr/bin/env python /srv/respeaker-led/trigger.py leds_off
      - shell:
          cmd: /bin/sleep 0.2

Using this configuration as a jumping-off point, you can add multiple other options, and including the various shell commands you can ensure that the LED ring shows the status of every task. So far, the only downside I’ve found with Kalliope is that single-word triggers are generally unsupported; the device doesn’t realize to stop listening, so try to keep them to two or more words.

I use a custom Systemd unit to ensure everything is started correctly, including output buffering, and as mentioned above ensures the configuration repository is always up-to-date with the origin, making configuration updates on-the-fly to multiple devices quick and painless.

# Kalliope service unit file
[Unit]
Description = Kalliope voice assistant
After = network-online.target

[Service]
Type = simple
WorkingDirectory = /srv/kalliope-config
User = kalliope
ExecStartPre = /usr/bin/pulseaudio --daemon
ExecStartPre = /usr/bin/ssh-agent bash -c 'ssh-add /srv/git-deploy.key; git pull; exit 0'
ExecStart = /usr/bin/stdbuf -oL /usr/local/bin/kalliope start

[Install]
WantedBy = multi-user.target

Install and enable the systemd unit file using a full path; this is a relatively unknown feature of systemctl that comes in handy here:

pi@kalliope:~$ sudo systemctl enable /srv/kalliope-config/kalliope.service
pi@kalliope:~$ sudo systemctl start kalliope.service

The Next Steps

With all of this assembled, you can test out the system and make sure it’s doing what you want. Here’s a sample video of my unit in action. I will probably be building a few more (and getting a few more WeMo switches and dimmers) soon!

Thank you for checking out this project, and feel free to send me any feedback! Hopefully this helps someone else build up their voice-controlled home automation system!