Developing Alexa Skills

In this article we’ll explore how to add some custom skills to your Alexa powered device. The material in this blog post was tested on an Echo Dot – however Amazon recently enabled Alexa on most Android powered smartphones so the barrier for entry to developing these skills has been lowered significantly.

I actually ended up implementing the following features into my custom Alexa skill:

  • Switching on my PC
  • Controlling the TV power and TV source
  • Getting the current Euro value of ethereum

In this article we’ll focus on the second item – controlling the TV, however if you’re interested in exploring any of the other two topics please leave a vote in the poll at the end of the article.

Here’s a video of the results from what we’ll be building here:

The equipment used in this project – apart from Alexa – were the following:

  • A raspberry pi connected via HDMI to the TV. In our case, the Raspberry Pi is loaded with OSMC / Kodi. The raspberry pi also needs a connection to the internet.
  • The TV itself is a Samsung Smart TV. I don’t think this point is very important as i’ll discuss later – we’re including it here for completeness.

Step 1 : Getting to know HDMI-CEC

You have several options when it comes to controlling your TV from the raspberry pi. you could hook into your TV’s API if it has one, or go old-school and use actual infrared to control your TV just like your remote control does – using some very cheap hardware and LIRC on the raspberry PI. However these options are either vendor-dependent (or even worse – firmware version dependent) or are a bit clunky to setup and require line-of-sight. HDMI-CEC is a perfect solution to this. There’s plenty of literature on HDMI-CEC out there on the internet, so we won’t go into the details here – but in a nutshell HDMI cables are not one-way like VGA back in the day, it’s a two way cable that allows the video player to send messages to the TV and vice-versa. Although HDMI-CEC goes by different names depending on the vendor, it is relatively vendor agnostic  and works on multiple TV models.

The installation on the raspberry pi is a simple one liner:

sudo apt-get install cec-utils

The main program you’d use is “cec-client” and it’s best documented in this tutorial blog post:

https://blog.gordonturner.com/2016/12/14/using-cec-client-on-a-raspberry-pi/

I managed to find all the codes I needed from the excellent cec-o-matic site:

http://www.cec-o-matic.com/

This results in the following four commands that we’ll use in our program:

Switch the tv on:
                 'echo "on 0" | cec-client RPI -s -d 1
Switch the tv off:
                 'echo "standby 0" | cec-client RPI -s -d 1'
Change the tv source to kodi:
                  'echo "as" | cec-client RPI -s -d 1'
Change the tv source to satellite:
                  'echo "tx 1F:82:30:00" | cec-client RPI -s -d 1'

We can do a simple echo “as” when switching to kodi because the raspberry pi simply sets itself as Active Source, while when switch to satellite we used cec-o-matic with the following settings:

Capture

Notes:

  • The source is set to Recording 1 (which we got from using cec-client to scan the HDMI bus, like so:
    echo "scan" | cec-client RPI -s -d 1
  • The destination is set to broadcast
  • The physical address is set to 3.0.0.0, which we also get from the cec-client scan; just look out for the physical address of the HDMI source you would like to put on

Those are the only commands we’ll use in this blog post, but uou can also monitor CEC commands using cec-client to implement pretty much anything:

https://ubuntu-mate.community/t/controlling-raspberry-pi-with-tv-remote-using-hdmi-cec/4250

Step 2: Preparing the Raspberry Pi

Now that we have the cec-client sorted out, we need to have some framework which allows alexa to execute these commands. I’ve seen some implementations use firebase to allow the Pi to communicate with Alexa:

https://medium.com/@vishmathur5/alexa-turn-on-my-tv-bcaccc94f1c2

It’s a pretty neat solution, but a more elegant solution (cause we dont need firebase) is to use the nifty python library “Flask-Ask” from John Wheeler. Amazon have a simple tutorial you can follow step-by-step to get up and running, here:

https://developer.amazon.com/blogs/post/Tx14R0IYYGH3SKT/Flask-Ask-A-New-Python-Framework-for-Rapid-Alexa-Skills-Kit-Development

Once you have setup your memory game, we now proceed to modify our code to include the following functions:

The code should be pretty straightforward: we define two intents, one to switch the TV on/off, and the other to change the source. Each intent accepts one input variable, to help us control the output of the function. Depending on this input variable, we issue a cec-client command which we explored in the previous section, that’s executed directly by the OS using the subprocess module in lines 6, 9, 18 and 21.

NB note that we use hardcoded commands – otherwise having “shell=True” would make these subprocess calls insecure

Once the command is run, we return a statement containing a message which Alexa will read out, which brings us to our next section…

Step 3: Building your Alexa Skill

The final step is to interface your Alexa to the program you just wrote above. Broadly speaking, the following three steps are followed:

  • The user triggers Alexa into listening mode by using the trigger word “Alexa” (by default)
  • Next, the user can speak several phrases that include the invocation name of your program which lets Alexa know that what’s coming next is meant as commands for your program. A word of caution here, make sure to use a clear but relatively unique invocation name. If you use an easily-misunderstood word like “Jarvis”, Alexa might not realise you mean to invoke your program. On the other hand, if you use a common invocation word like “Home Assistant”, probably some other published skill also has that same invocation – leading Alexa to enable that published skill rather than using your program. In my case, I used the invocation word “batman“, which is easily understood by Alexa, and doesn’t have any competing published skills. So in this case the invocation phrases I could use are:
    • Alexa, ask batman to …
    • Alexa, open batman and …
  • Next comes the intent. These map directly to the intents we wrote in our program. In our case, we have two intents, (lines 1 and 13) which are the TVPowerIntent and the TVSourceIntent. Make a note of these names, because we need to instruct the Alexa SDK which utterances or phrases the user can speak to invoke or trigger these intents. For example, we might want the following:
    • Alexa, ask batman to switch on the television => TVPowerIntent, with variable “power” = on

As a summary, we’d end up with the following command structure:

Alexa_CMD_Structure

This mapping is done via the Amazon Developer Console. Make sure to login to the developer console using the same account in which the echo is registered.

If you followed the tutorial for flask-ask, you probably already have a skill defined. Just make sure that the invocation of the skill matches the invocation phrase that you’d like to use. In my case, the skill invocation is “batman”. The following steps assume you are using the Interaction Model Builder.

  • On the left hand side, under intents, click “add new”. We define two intents, with the same names as the intents we defined in flask-ask above

intents

  • For each intent, we define a variable (or slot) to pass into our function. The name of the slot has to match the variables you defined in the @ask.intent lines of flask-ask:

slots

  • Each slot is given a set of accepted values. You can use the built-in alexa values for word, names, or numbers. In my case I defined a custom value set (called “power_states” above, which has only two accepted values, on/off:

slot_values

  • Last, for each intent we define a set of utterances which will trigger our intent – our “intent phrase”. Put in as many variations as you can think of, to make it easier to call your program:

intent_phrases

Note how the name of the variable or slot is {enclosed in curly brackets}.

The steps to make amazon invoke your program written in flask-ask are exactly the same as in the tutorial we referenced previously.

That’s it… save and build your model. (complete generated model code included at the end of the article)

Advertisements

Nugget Post: Reactive Functions to parse nested objects

Note this article assumes familiarity with the Observer Pattern / Reactive Programming as described here: http://reactivex.io/

Some APIs return complex nested JSON objects. For example, take this cleaned up sample response from ElasticSearch (which incidentally is used to build the “Data Table” visualization):

Note the structure of the object. Within the top level “aggregations” object we see a recursive nested structure; each nested object has a “buckets” object, which contains an array of objects, and each object also contains a “key”. The question now is, how do we efficiently traverse the above object to extract each “key” value while retaining the parent’s object “key” as well? To further illustrate, taking a subset of the example above:

rxBlog1.png

 

It was actually easier for me to reason about the above using imperative style programming, which would look something like this:

However, the idea is to use ReactiveX programming to traverse the tree in order to make the code more concise. At each key, the program should “pass down” the key to it’s child observables right down to the final child, which would then emit the result to a subscriber. This is what we end up with (in RxPY):

 

Let’s step through the code:

  • Lines 3-4: If you notice, each object (which I call “aggregation” in the code) contains an object called “buckets” which is an array. We can create observables from arrays, so this function simply grabs the “buckets” array of an arbitrary aggregation and returns an observable
  • Line 6-7: First time we call the getAggregation function to return an observable. Now we have an observable emitting the outer objects. We need access to the next inner object, which itself contains another “buckets” array that we can turn into a “child observable”. Therefore each object (which I call “transaction” in the code) is passed into the getAggregation function once again. However, we would like to pass on the parent’s “key” value to every emission from these child observables. That is the role of the map function which pre-pends the key to the actual emission.
  • At this point we have an observable of observables – which we need to flatten in order to pass it to subsequent stages – that’s the role of the flat_map stage.
  • Lines 8-9 are repetitions of the same pattern described above – note how at each stage we add the key to the emission of the child observable and flatten the observables into a single stream for the next stage
  • Line 10: we call our final “map” to transform the results into tuples as shown in our diagram above
  • Line 11: generic subscriber function

It’s a good exercise to:

  • really understand the difference between flat_map and map
  • understand how to pass variables to child observables via the use of a “map nested in flat_map” pattern
  • did it really make the code more concise? Do you still find it easier to reason in terms of imperative?