Problem, solved.

On Saturday Oct 14th 2017, I took part in a Voice UI Hackathon hosted at the Burda Bootcamp in Munich and powered by Amazon and Maxdome.

The context

Maxdome, number four video streaming service in Germany with a market share of 11% behind Amazon Prime video (32%) and Netflix (17%), had opened its API to the hackathon participants in order for Alexa to communicate with Maxdome’s metadata. The purpose of the hackathon, although not officially stated, was to control Maxdome’s video services via voice on an Alexa device.

How it started

I had already attended some local events around VoiceUI but I’d never taken part in a big hackathon event before, which was actually fully booked with more than 120 participants. At the Hackathon, I bumped into the father of one of my daughter’s friends from kindergarten who happens to be a senior java developer. We had a quick chat about video services. I told him how unhappy I was with my daughter having bought some video content without my consent on Amazon Prime Video and he told me how shocked he was by the violence on some of the 30 second advertisements when simply watching some cartoons with his son on YouTube. We decided to team up.

Logically, we started to think about what could be done in order to prevent children from watching inappropriate video content. Maxdome already has a security pin code system in place which needs to be entered manually – not a pleasant user experience. What if Alexa could become the gatekeeper?
Obviously, saying out loud your pin code in front of your family is not an option, so I needed to find something cleverer: some kind of voice based captcha system.

The making of the team

It was time to create our teams of four in order to get started. I went on stage and presented the idea to the audience. Some people raised their hand to become part of the team. I interviewed them and selected two additional persons: an interaction designer and a JavaScript developer, both were motivated and showed a positive attitude.

On a flipchart I started to brainstorm with my team. I wanted us to focus on how a machine could differentiate an adult from a child.

Voice Signal Frequency recognition?

The average man’s speaking voice typically has a fundamental frequency between 85 Hz and 155 Hz. A woman’s speech range is about 165 Hz to 255 Hz, and a child’s voice typically ranges from 250 Hz to 300 Hz and higher. This was not an option because too complex for a hackathon.

Voice recognition?

Available in the US since a couple of days but not in Europe yet, requires training and is not practical when the authorized person is not at home.

Security question?

Children learn counting only when they go to school, so asking a child how much makes 4 + 5 for example, could only get answered by people from age 6 and above: perfect for FSK 6 – content that is recommended for 6 years old at least. So I suggested we ask questions that only a person of a certain age can answer. We then defined further questions for FSK 12, FSK 16 and FSK 18.

We picked up some movies for each category from the Maxdome catalog and got started.

Setting the objectives and start coding

Our objective was to present a working prototype and we only had a few hours left. Therefore, I defined everyone’s to do list. Two of us would work on the back-end, dealing with the query and the API, while the interaction designer and I would define the interaction flow, the actions that fulfill a user’s spoken request (called “intents”) and perform some research on how people, and especially children, consume video content.

A few Red Bulls and pizza slices later, I organized a status update to see what would be achievable and what we would need to leave apart. Setting up the environment already had cost us a lot of time and nobody in the team had experience with “AWS Lambda”, the back-end language used to code “skills” for Alexa. I sat with the developers to see where they were stuck and helped them find workaround solutions, strong enough for a demo.

Since the presentation format was a four minutes pitch on stage, I also put the presentation together. A quick internet research revealed that not only 62% of parents say age-inappropriate content is their top concern but also that 63% of teens believe that accessing inappropriate content online accidentally is an issue.

The final pitching

The final countdown came as we were just testing our prototype. Now was the time to present it on stage. There were about 22 teams, since some participants had already given up during the day. We presented our idea almost last, when the jury was already getting tired. So I engaged with the public in order to wake them up a little bit. Everyone in the team played a role, the developers set up the demo and we had the jury testing it by themselves. It went flawless.

Conclusion

Many other teams had built impressive prototypes, many had known each other for a long time and many already had coding experience with the Amazon Alexa SDK. We didn’t. However, we had a solid business case, we were addressing a real user need and we were interacting with the system for real.

The jury came back a few minutes later and announced us as the winner of the competition.