We are living in an age of smart devices, where voice assistant applications or devices have become the access point for controlling a number of functions. Some of these popular devices are Alexa by Amazon, Google Home, Siri, and Cortana. Most new smart products that are launched into the market nowadays also have inbuilt voice assistant features.
According to Gartner, the sales of voice-first devices are estimated to top $2 billion by 2020, which means that they will have a significant role to play in the future of smart devices.
One of the key areas of focus now is skill development for voice first devices. Amazon has 10,000 skills added to Alexa. With custom skills being added on a daily basis, Alexa is a more intelligent device that can solve diverse queries.
As skills are being developed, there is also a need to test these skills thoroughly so that they can deliver an optimal user experience. Functioning of the devices may seem simple on the surface, but in fact there are many challenges that determine the success of the skill.
There are two factors that determine the success of voice-first app or devices: speed and accuracy. Most of the challenges that VA testers would face revolve around these three factors. Now let’s try addressing some of these challenges.
Challenges in voice-first skill testing
High volume of datasets: When it comes to operating a voice-first device like Alexa, all it involves is the user saying the “wake up” word, followed by the command or instruction and the device/application responding with the appropriate action. Though on the outside this may look like a simple function, if we go deeper we can see many layers of complexities.
Apart from the regular types of testing methodologies – unit, system, integration, performance, endurance, etc. – we also need to ensure that voice apps or devices have been tested for many different commands/utterances for the same skills.
The number of test cases can spiral out of any logical estimation. For instance, to order pizza you might say, “Alexa, can you order 2 large cheese pizzas please?” or “Alexa, order a thin crust pepperoni pizza.” In both cases, the voice app/device is expected to be able to process the request and give the same results. In the same way the same question can be framed in different ways. For example, “Alexa, switch on the light?” or “Alexa, turn the light on?”
Another thing testers need to keep in mind is that these services tend to evolve over time. Their responses may change due to knowledge gained over interactions as part of personalization services. This adds another layer of complexity.
So how do you ensure that you include different ways the question is spoken and then derive the correct answer from it? Skill testing involves a large number of data sets that will give appropriate responses no matter how the question is framed.
However, it is manually impossible to generate such an exhaustive list. Testers will not be able to test such high volumes when each sprint may only last for a few days.
Creating dataset based on different domains: Each domain has a different set of core questions or needs that the services revolve around. Manually generating questions for domains may be a very tedious task at the same time does not guarantee coverage of all relevant questions and answers. It will also take a huge amount of time to understand what challenges or questions a customer would want to ask.
This is where testers would require help in creating data sets covering domain specific questions and required answers.
Language and accents: It is obvious that people speak different languages in different regions. Therefore, voice assistants need to be able to understand and answer queries in various languages. So how will we test skills in various languages? This is another challenge when it comes to selling your devices across the world. The most important factor here is personalizing the experience. Testers will not know all the languages in which the device needs to be tested.
Another factor that matters while doing voice-first testing is accent – even when the language is the same, accents can vary significantly. A language like English, which is spoken the world over, also requires much more testing as accents are different depending on different regions. Therefore, it becomes challenging to identify various accents and test them to derive accurate results.
Different age groups: When we try to sell a product out in the market, we want to cater to a large audience around the world. The audience can be of different age groups. Voice first based devices are not built for specific age groups. When you are catering to an audience of various age groups, you also want to ensure that it provides great user experience to all customers.
A young school kid would have a different ways of formulating a question compared to that of an elderly person. Their needs would also be different. So the device needs to cater to queries of various age groups. This requires including various age groups in the data set. A list of questions that are corroborated by various age groups need to be included. Now this where the tester needs to create that kind of data sets.
How to overcome these challenges?
They key ingredient that can help testers overcome these challenges is test automation. When you have large volumes of datasets, it is very difficult to impose manual testing. The solution here is the automation in voice-first testing.
eInfochips has developed a fully automated voice testing framework called VAQA (Voice Assisted Quality Assistance). This framework enables automated testing of various voice assistants. VAQA framework supports automation of a variety of smart devices like smart speakers, headphones, smart home appliances, and home automation devices among many others.
It is capable of automating end-to-end use case from device to cloud, including device configuration, environment setup, connectivity testing, speech translation and recognition, multi-lingual support, device functionality, device to cloud connectivity, and data generation. Therefore, it covers the challenges we already addressed, and a lot more than that.
This automation framework will also act like an accelerator, resulting in faster time to market and reduced operational costs for end customers. That means that it will directly influence the bottom-line. Also, when you have frameworks that accelerate your testing process, it allows you to meet the expectations of sprints.
One the key features of this framework is that it has an AI-based Question & Answer model that allows you to generate test data automatically. The answer-matching feature allows you to ensure that you get accurate results from your test data. This feature is robust enough to handle changes in responses of voice assistants over time. Multilingual support can be provided by automatically translating questions into various languages, or according to age groups and accents.
To know more about how this works, please check out our free webinar:
Enhancing Voice Assistants using VUI, NLP and Cognitive Voice Testing
eInfochips is a pioneer in developing some of the best IoT products that are deployed all across the world. We develop IPs and frameworks that accelerate test cycles and time to market at the same time. If you are in the process of building a voice assistant or need help with cognitive QA services, please get in touch with us.