ALEXA: That could be great, if only……

In­halts­ver­zeich­nis

 

Well, again brea­king with tra­di­ti­on, this ar­ti­cle will be in Eng­lish. In short, it’s about Ama­zon’s se­mi-awe­so­me con­cept of doing voice pro­ces­sing in the cloud and trig­ge­ring dif­fe­rent ac­tions upon cer­tain key­words re­co­gni­zed. I came in touch with it at Ama­zons Echo work­shop in Mu­nich, 3/23/17, and or­de­red an Echo Dot right away to play with it.

1. Pre­face In­halt

A word on tech­no­lo­gy and to tech­no­lo­gy-dri­ven com­pa­nies in ge­ne­ral: Dear pro­duct ow­ners, for the love of god, plea­se have your com­pa­ny and techs in the bas­e­ment crea­te tech­no­lo­gy that makes live ea­sier. Rant com­men­cing…

  • I do not want to tell the in­tel­li­gent mi­cro­pho­ne my live’s story be­fo­re a light turns on.
    • „Alexa, lights on“ = good. „Alexa, plea­se turn the lights on“ = bad.
    • „Oven, 180 de­grees, hot air“ = good. „Alexa, plea­se set the oven to 180 de­grees and switch on the hot air pro­gram“ = very bad.

    Red parts show way too long of an in­ter­ac­tion. Ama­zon, your com­pe­ti­ti­on is the human pres­sing a but­ton in like 0.2 se­conds. Be bet­ter than that.

  • Why do you make it so hard for de­ve­l­o­pers to ac­tual­ly build great stuff?! Smart Home ap­p­li­an­ces only tal­king to clo­sed off cloud sys­tems, Alexa sys­tem grab­bing voice input for some con­trol and smart home pat­terns be­fo­re I can pro­cess in my cust­om skill, …
  • And last­ly, plea­se stop pa­tro­ni­zing your cust­o­m­ers. You never know ex­act­ly or com­ple­te­ly what your cust­o­m­er wants. Let the ear­ly-ad­ap­tors play with your pro­duct. Be­ne­fi­ci­al for both of us: we get our dream so­lu­ti­on, you get pu­bli­ci­ty and one or two ideas for free!

By the way, I’m not the first to ad­dress this, see his slide #10 from April 2016, ne­ar­ly one year ago. He has EX­ACT­LY the same points.

Now, let’s see how far we get with the Alexa Pro­duct. And yes, I will not an­thro­po­mo­phi­ze it – it’s not a she….yet.

2. Sys­tem Ar­ci­tec­tu­re In­halt

Not par­ti­cu­lar­ly com­pli­ca­ted, the sys­tem ar­chi­tec­tu­re.

  1. Great mi­cro­pho­ne array is al­ways open and lis­tens for a spe­ci­fic phra­se, e.g. „alexa“.
  2. After that, all input is strea­med to the cloud ser­vice for about 8 se­conds.
  3. Audio data is pro­ces­sed by a „spo­ken lan­gua­ge un­der­stan­ding“ thin­gi (pro­bab­ly large ANN bought from Po­land) for human voice. Also known as speech-to-text or voice re­co­gni­ti­on. Ac­tual­ly, here the voice is also se­pe­ra­ted into an „in­tent“ (=pro­gram to call) and „slot va­ria­bles“ (ar­gu­ments to said pro­gram).
  4. Now the in­tend mo­du­le (of a par­ti­cu­lar „skill“) is cal­led, does some ac­tion and ge­ne­ra­tes a spo­ken re­s­pon­se sent back to eEcho. I’m not cer­tain if the Echo does the voice syn­the­sis on its own hard­ware, or if it just plays back an audio stream.

Ide­al­ly, the echo de­vice would not need LAN ac­cess at all und just have the cloud com­pu­ters cal­cu­la­te voice re­s­pon­ses and other ac­tions.

Ama­zons Echo Sys­tem Ar­chi­tec­tu­re (c) by Donn Mor­rill, Ama­zon

Also note that for now, they say no echo de­vice is al­lo­wed to send com­man­ds to the local net­work. I don’t know if that’s still true as for smart home dis­co­very some peop­le say it does local SSDP broad­cast to se­arch for de­vices.

The awe­so­me parts are that:

  • voice-to-text pro­ces­sing seems to work ex­tre­me­ly well, with both Ger­man and Eng­lish,
  • even with music play­ing from the Echo spea­ker, the sys­tem will usual­ly re­co­gni­ze my input,
  • it seems to cope with dif­fe­rent spea­kers re­al­ly well.

The less-awe­so­me parts are that:

  • Ama­zon has put many hooks in place which grab user input and route it to Ama­zon sys­tems (e.g. for music play­back, wea­ther, traf­fic in­for­ma­ti­on, …),
  • cust­o­mi­zing re­ac­tions to voice com­man­ds is cum­ber­so­me and li­mi­ted at the mo­ment („cust­om skills“),
  • it’s a very cloud dri­ven so­lu­ti­on.

While I ac­cept that for voice pro­ces­sing, there are so many use cases that can­not be ge­ne­ra­li­zed for the pu­blic and are only valid for sin­gle in­stal­la­ti­ons. In order for this sys­tem to grow bey­ond the toy-fac­tor, Ama­zon will need to open up to the de­ve­lop­ment com­mu­ni­ty much more. See above rant. And also the bad cust­o­m­er re­views for the ma­jo­ri­ty of the skills….

3. Skills and Skills… In­halt

In order to make use of the voice com­mand sys­tem, you’ll need to crea­te your own „skill“ (bunch of callable pro­grams) with one or more „in­tents“ (pro­grams to call). Howe­ver, there are dif­fe­rent skill types:

3.1 Smart Home Skill In­halt

  • au­to­ma­ti­cal­ly re­acts to any phra­ses ty­pi­cal­ly met in such a con­text („turn on“, „turn off“, „dim up“, „dim down“, …)
  • has pre­de­fi­ned in­tents („on“, „off“, „dim to“, „dim up“, „dim down“, „set tem­pe­ra­tu­re“, …) – no other com­man­ds, such as color, pro­gram or si­mi­lar
  • man­da­to­ry OAuth ac­count lin­king (you can use Ama­zon OAuth „login with ama­zon“)
  • can only di­rect­ly call AWS lamb­da func­tions, not your own https ser­ver. Howe­ver, AWS lamb­da func­tion can in turn call your dyndns’d home ser­ver script and you do the trans­la­ti­on into local REST-re­quests there.
  • no cust­om re­s­pon­ses, only „OK“ or „error with skill“
  • Afaik, no back and forth sup­por­ted („Alexa, are all lamps off?“)
  • Afaik, only one lamp per input (no „turn on ba­throom and kit­chen“) 

Un­for­t­u­n­a­te­ly, it’s not pos­si­ble to de­fi­ne rooms or other cri­te­ria (e.g. buil­ding story) for the voice en­gi­ne to match to. For now, it only ac­cepts a name.

I don’t know what’s best prac­tice. Given a lamp in a room, I named it e.g. „li­ving room“, so I can go „Alexa, li­ving room off“. If I have two lamps in the li­ving room, I wouldn’t know what to do, since I can­not say „turn off lamp one in the li­ving room“, afaik. They say that a de­vices ad­di­tio­nal in­for­ma­ti­on [1-4] is not being used by the voice pro­ces­sor.

3.2 Cust­om Skill In­halt

In order to over­co­me these dif­fi­cul­ties, I then tried the Cust­om Skill va­ri­ant. And boy, does this loo­ked great at first.

  • Cust­om https script callable (json data in PHP’s $HTTP_RA­W_­POS­T_DA­TA)
  • Upload self-si­gned cer­ti­fi­ca­te
  • Have the Echo speak any re­s­pon­se you want, you de­fi­ne it in a text re­s­pon­se.
  • No AWS lamb­da re­qui­red
  • cust­om phra­ses („ut­ter­an­ces“) and cust­om in­tents and slots, you au­to­ma­ti­cal­ly get the clo­sest match from a list of choices

Being more in­tel­li­gent than the sys­tem, I chose „lights“ as ac­tiva­ti­on word of my skill and was loo­king for­ward to buil­ding the best voice com­mand struc­tu­re ever. So, I fed it all the phra­ses I could think of:

  • Lights­In­tent off in {room}
  • Lights­Pro­gram pro­gram {num­ber} for {lights­na­me}
  • Lights­Co­lorIn­tent color to {color} for {lights­na­me}
  • …you get the idea…

Ide­al­ly, I’d ac­tiva­te my cust­om skill with „Alexa, lights“ and then one of the cust­om phra­ses with one or more slots would get sent to my ser­ver, which in turn would issue the ap­pro­pria­te REST com­man­ds to the ap­p­li­an­ces.

Un­for­t­u­n­a­te­ly, this only worked in the si­mu­la­tor in the Alexa de­ve­l­o­pe­ment con­so­le.

As I said be­fo­re, ever­y­thing re­mo­te­ly soun­ding like a Smart Home re­la­ted com­mand, even the un­sup­por­ted color com­man­ds, is not sent to your cust­om skill. Ins­tead, Echo will tell you that there are no Smart Home de­vices or that this com­mand is not avail­able for the spe­ci­fic de­vice.

3.3 The best of both worlds In­halt

Al­right, so, in order to still have a usa­ble sys­tem, I crea­ted both a Smart Home Skill and a Cust­om Skill.

  • The Smart Home Skill would con­trol some lamps I have and the hea­ting. I’d put ever­y­thing with re­gards to dis­co­very into the AWS lamb­da and for­ward the extrac­ted de­vice name and com­mand ar­gu­ments („on“, „off“, „set­Per­cent“) via REST to my own ser­ver igno­ring cer­ti­fi­ca­te ve­ri­fi­ca­ti­on.
  • The Cust­om Skill would con­trol the so­phis­ti­ced lamps (of­fe­ring light pro­grams and co­lors via REST in­ter­face lo­cal­ly). It is ac­tiva­ted by „Schein­wer­fer“, Ger­man for flood­light. One of the words the Smart Home en­gi­ne does not react to.
    Since it can talk to my ser­ver di­rect­ly, only the pro­ces­sing chan­ges on my ser­ver: I don’t get a GET re­quest, but ra­ther a POST with a lot of JSON data in the POST body. After doing a JSON-de­co­de, I can issue the ap­pro­pria­te com­mand to the lamp’s gate­way.

!! Even though Ama­zon pro­bab­ly doesn’t want this, I’d wish for:

  • Disa­ble their smart home com­man­ds – not pos­si­ble at the mo­ment afaik
  • Crea­te a cust­om Skill that can have a re­ser­ved ac­tiva­ti­on word („lights“) – not pos­si­ble at the mo­ment afaik
  • Put my own In­tents and Type Fiel­ds there – works
  • De­fi­ne an AWS lamb­da suite to pro­cess my re­quests – works
  • Put my own In­tent hand­lers there – works
  • Give an In­tent hand­ler the op­ti­on of de­fi­ning a LAN-lo­cal URL („http://​192.​168.​20.​10/​turnon“) to call: – not pos­si­ble at the mo­ment afaik
    • Echo would relay the re­quest in my LAN,
    • and re­turn the re­s­pon­se to my AWS lamb­da func­tion
    • The Echo could relay dis­co­very, con­trol and query ope­ra­ti­ons
  • Enable an in­tent hand­ler to de­fi­ne a re­s­pon­se text – not pos­si­ble at the mo­ment for Smart Home skills
  • Enable more at­tri­bu­tes for Smart Home de­vices
    • room
    • main­ten­an­ce group (DE: „Ge­werk“)

With this, pos­si­ble more Smart Home owner could use their own sys­tems and gate­ways from so many dif­fe­rent ven­dors with the Alexa sys­tem wi­thout going through the FHEM hass­le.

4. Some ca­veats I’ve no­ti­ced so far… In­halt

  • Bought the la­test album „Under Stars“ by dear Amy from Ama­zon into my ac­count, howe­ver, the sys­tem would only play ex­cer­pts if told to play the „la­test album by amy mc­do­nald“. Tel­ling it the album name di­rect­ly worked.
  • Mixed lan­gua­ge input – es­sen­ti­al for music con­trol – is very early stage for now. Even more so if you have funny spel­ling, such as „(r)Evo­lu­ti­on“ by Ham­mer­fall.
  • Ne­ar­ly ever­y­thing sound­ling like a smart home com­mand, i.e. ha­ving the words „lamp“, „light“, „LED“, etc., in it, will go to the smart home pro­ces­sing en­gi­ne. And since that is very li­mi­ted at the mo­ment, e.g. no color com­man­ds, no pro­gram num­bers, I can­not build the awe­so­me smart home voice in­ter­face I was ho­ping for.
  • For smart home eco sys­tem, you can­not de­fi­ne rooms, but see above.

Bislang keine Kommentare vorhanden.

Einen Kommentar hinterlassen