Ruuvu: building an Alexa Skill for IMDb ratings with alexa-app

We’ve enjoyed using our Amazon Echo for the past few months. Its built-in features provide some useful and fun capabilities, but the availability of the Alexa Skills Kit promises to really help the device maximize its potential. ASK allows third-party developers to build “skills” to add all-new features to Echo. Naturally, I couldn’t help but take a stab at building an Alexa Skill myself.

Ruuvu github repo.

In our household, when we have time to kick back with a pay-per-view movie, it can be hard to sift through all the different movies, many of which we’ve never heard of. We often find ourselves consulting IMDb for ratings before making a selection. I thought it would be great to be able to ask Alexa for the ratings, rather than getting out our phones and searching with the on-screen keyboard. Out of the box, the Echo doesn’t have the ability to get ratings like that. You can say something like “Alexa, who starred in ‘Oblivion’?”, and Alexa will perform wonderfully. But there’s no support for getting IMDb ratings. Here’s an opportunity for an Alexa Skill.

So the first thing I did was read up on the Skills Kit to get an idea of how to approach the project. After reading some tutorials like this one, I opted to use an AWS Lambda function.

I use AWS extensively, but I was unfamiliar with Lambda. Now that I know more about it, I’m pretty impressed with what it provides. Lambda will run your code in response to events (like changes to files on S3) or HTTPS requests (so your code is like a simple web service), and more importantly, it scales up compute resources for you. So it’s a lot simpler than standing up your own code on load-balanced EC2 instances that you have to manage. And your first 1M requests each month are free, so it’s a great place to get started.

Lambda lets you build your function with Java or Node.js modules. I opted to go the Javascript route, and I decided to use Matt Kruse’s alexa-app node module to interface with Alexa. I thought his API made the whole ASK interface cleaner than the AlexaSkills.js provided by Amazon. And the best part of alexa-app is the auto-generation of the intent schema and the sample utterances. There is a simple syntax for specifying utterance variations in a compact fashion, letting the code expand out all the combinations of different phrasing.

To get started, I created a directory for my project. In this directory, I created an index.js file to contain my lambda function. I then pulled in local copies of the node modules I needed:

(I’m glossing over this a bit — I didn’t know exactly how I was going to interface with IMDb data when I started, and I didn’t know I needed Levenshtein until later in the project).

In my index.js, I pulled in the alexa-app module and added a launch handler. My skill name is “Ruuvu” — don’t ask me why — it was the product of some brainstorming with my kids, and it was what the group liked best. It’s definitely better than making the user say “Alexa, ask ‘IMDB Ratings Search’ for the ratings for …”. Skill names should be short and sweet.

When the user says “Alexa, open Ruuvu”, Alexa will speak the message in the response.say() call. Because I’m telling Alexa to not end the session, she will wait for more speech from the user. The second argument to response.shouldEndSession() is a reprompt — if the user doesn’t say anything for a little while, Alexa will speak that message. If the user still doesn’t say anything, Alexa will exit the skill.

Now we need to get Alexa to do something interesting — we need to define an intent. Let’s start with the HelpIntent.

Now if the user says “Alexa, open Ruuvu” and then says “Help”, the HelpIntent function will be kicked off, and Alexa will respond with “You can say something like what are the ratings for Another Earth”. The glue that maps the user’s speech to the intent handler is the utterances array. If Alexa recognizes anything in this list, the intent handler will be triggered. This intent only is triggered by the simple word “help”. Other intents may need to resond to a wider variety of utterances in order to provide a natural user experience. We’ll get to that in our next intent handler.

You’ll notice that the help intent handler I ultimately built is a little more sophisticated. It provides two samples, and it randomly generates the samples based on alternate utterances and random movie titles.

Now that we’ve got a launch handler and help intent handler, we can focus on the heart of the skill — actually finding IMDb ratings for movies. We can define a RatingsIntent.

Notice that the utterances are quite a bit more intricate than the simple “help” utterance we deal with for the help intent handler. I have tried to anticipate all the ways a user might ask for movie ratings. Alternatives are enclosed in braces and pipe-delimited. alexa-app will expand them out into all the possible combinations.

For this intent, we also define a slot, called “TITLE”. This is the part of the utterance we can’t exactly predict. This is where the user would specify a movie title. The best we can do here is provide samples to Alexa of what kind of input the user might provide. So we take advantage of an alexa-app dictionary we define called movie_names. I have seeded that with IMDb’s top 25 movies of all time, along with some of my personal favorites, for a total of 39 different titles. Amazon’s documentation on these samples is a little sparse. The idea isn’t to provide every single title that the user might say, but to provide a representative sample. For example, you should provide samples of movie titles with a wide variety of word lengths. You’ll notice that the titles of the movies I’m providing has anywhere from 1 word to 8 words. This seems like a fairly reasonable range to cover most titles.

To generate the intent schema and full list of utterances, you can run the script like this:

alexa-app will expand the utterances by plugging in every single combination of the alternate phrases. For example, with the first utterance string in the intent definition, alexa-app will generate 3 * 2 * 2 * 39 = 468 utterances. This would be highly impractical to code by hand, so this is where alexa-app really shines.

The intent handler is called whenever the user says something that matches one of these utterances. The movie title is accessible via the slot:

We pass this slot value to the lookup function and return false, indicating that this is an asynchronous intent handler. The lookup_movie() makes calls to the OMDB API to search for the movie, find the best match (using the Levenshtein algorithm), retrieve the ratings for the best match, and read it back to the user. There is error handling built into this function as well, so if the OMDB calls fail for some reason (they often do — I think they need a little more compute horsepower, because their API can be very slow), we can give the user a meaningful response.

Note that we can respond back to the user in two ways — speech and in-app cards. You may have noticed that your phone’s Echo app provides a list of its response to the queries you’ve run. Your Alexa Skill can generate these cards with its response data:

So while you are generating the spoken response, you can also generate text for the card, and use the say() and card() methods to send both back to the user.

Session management

One thing that I was concerned about is the concept of session management. Often you might launch your skill with an intent and expect the skill to provide the response and immediately shut down:

Other times, you might open the skill with no intent: and make multiple queries: In the second case, we want to keep the skill’s session open. You’ll notice that in the launch handler, I am setting a session variable called open_session. We only go into the launch handler when the user opens the skill with no-intent (e.g. “Alexa open Ruuvu”). When this happens, we set the open_session variable to true. Whenever we respond to the user, we check this session variable and based on its value, we can direct Alexa to keep the session open. By default, sessions are closed after the response is received. But you can keep the session open like this:

Recognition considerations

Voice recognition is a tricky problem. Alexa does a remarkably good job of it in most cases. After a lot of experimentation, I’m fairly certain that the dictionary that drives the voice recognition is seeded not only with standard words and phrases, but also a fairly extensive dictionary of pop culture terms. For example, I was surprised that it recognized “Harry Potter and the Prisoner of Azkaban” with no errors at all. “Azkaban”??? Really? But if there’s a dictionary of popular movies, musicians, song and album titles, etc., then this shouldn’t be such a big surprise.

There were a few types of queries that gave my software a lot of trouble:

  • numbers – Alexa will return numbers as strings, e.g. “iron man three”; OMDB wants the digits, e.g. “iron man 3”
  • names or made-up words – these kind of words are misrecognized entirely, making searching impossible; however if the movie was very popular, Alexa seems to recognize it
  • uncommon words – Alexa just could not get the word “Diviner” in the title “The Water Diviner”
  • awkward phrasing or punctuation:
    • what can you do with a title like “Tak3n”?
    • Alexa recognizes “the amazing spiderman”; OMDB wants it as “the amazing spider-man”
    • “E.T. the extra-terrestrial” was all kinds of problematic
    • roman numerals give the system fits
  • duplicates – there are many movies with the same name, e.g. “The Runner”. How do you know which one the user wants? This isn’t really a recognition problem, but it affects the query accuracy for the end user.

I’ve been able to compensate for some of these things. For example, I’m using code to convert number words into digits. I have biased the search results toward more recent movies, so you get the 2015 movie “The Runner”, not the 1999 one (you could argue whether that is a good idea). And I have an “AmbiguousMovies.js” library that maps Alexa’s recognition to good OMDB query strings. But it would take a lot of work to build that out fully and maintain it.

I would love it if I could do some sort of constrained recognition — if I could tell Alexa that the TITLE slot should contain movie titles, it might be able to consult a huge database of titles and then names and made-up words in more obscure titles might not be such a big issue.

One note about the OMDB API code — I had to modify the omdb node module to get metacritic and award data (https://github.com/misterhat/omdb/pull/11). The author is making changes, but I’m not 100% certain where this will end up.

Deploying

We will deploy a ZIP file with all our code to AWS Lambda.

Generate an intent schema and the sample utterances

Now use your code to put the lambda function on AWS

  • log into the AWS console
  • select “Lambda”
  • click “Create a Lambda function”
  • in the blueprint filter, type “alexa”
  • click on “alexa-skills-kit-color-expert”
  • the selected event source should be “Alexa Skills Kit”
  • click “Next”
  • enter name and description; select “Node.js” as the runtime
  • for “Code entry type”, select “upload a .ZIP file”; select your zip file
  • leave the handler alone
  • for “Role”, choose “Basic Execution Role”
  • a new window will pop up; accept the defaults and choose “Allow”
  • you can probably leave the memory setting alone; I bumped my timeout to 30 s due to slow performance from OMDb
  • click “Next”
  • review your settings and click “Create function”
  • on the resulting screen, you should see the ARN, something like this: arn:aws:lambda:us-east-1:149949863341:function:yourname you will need this later

You can test the lambda function even before you connect it to Alexa. This is probably a good idea. You can grab some sample request JSON from the template.js file in the alexa-app github repo. Just modify it the intent requests to match your schema. Example:

To run this request, click on “Actions” and choose “Configure Sample Event”. Paste in the JSON. Lambda will run your function and show you the JSON response.

Once you’re happy with the Lambda function itself, you need to define the skill in the Alexa developer portal:

  • open the Alexa developer portal
  • if you haven’t already signed up, do so; use the email address associated with the Amazon account connected to your Echo; note that if you have used “switch accounts” to switch to the account of another member of your household, you’ll have to switch back before you can test your skill
  • click “Get Started” under “Alexa Skills Kit”
  • click “Add a new skill”
  • enter the name (as it will show up in the Alexa app); in my case, it is “Ruuvu”
  • enter the invocation name (as it will be spoken by the user when he/she says “Alexa, ask INVOCATION_NAME what is the …”; I used “Roovoo”, thinking that might be easier for Alexa’s recognition engine
  • for endpoint, select “Lambda ARN” and paste in the ARN of your lambda function
  • click “Next”
  • paste in the intent schema and the sample utterances derived earlier
  • click “Save” and make sure there are no errors reported
  • click “Next”

You should now be able to test the skill by saying “Alexa, ask Ruuvu what are the ratings for American Sniper.”

One thought on “Ruuvu: building an Alexa Skill for IMDb ratings with alexa-app

  1. Is there a way to save/store information from a session that can be used in future sessions?

    I’ve got some python code running in Lambda that works well, but to make it public I need to store some of the user’s information (such as an auth token) so they can use it next time without having to re-run the setup process.

Leave a Reply

Your email address will not be published. Required fields are marked *