A virtual personal assistant (VPA) or voice assistant is a digital assistant that combines artificial intelligence, machine learning, speech recognition, Natural Language Processing (NLP), speech synthesis, and various actuation mechanisms to sense and influence the environment.
VPA uses the vehicle’s microphones to receive the user’s voice (input), and the speech signal is converted into digital data using speech recognition. Leveraging NLP, the digital data is analyzed by speech software. Further, the analyzed data is compared with a database of software using a speech algorithm to provide output. This database is located on distributed servers in cloud networks. Therefore, most personal assistants may require a reliable internet connection to function efficiently. The processed output in text format is converted to speech using text-to-speech and transmitted through the vehicle’s speakers.
Natural language processing (NLP): NLP helps understand text and words in a similar way as humans can. NLP is part of Artificial Intelligence in the field of linguistics that deals with the interpretation and manipulation of human speech or text using software. By combining machine learning, deep learning, and statistical models, NLP enables the system to understand the natural way of human communication.
Depending on the VPA development process, integration, and functionalities automotive VPAs can be segmented as:
Vehicles were conventionally deployed with speech recognition systems, offering limited functionalities. In 2004, Honda and IBM collaborated to introduce a conventional speech recognition system in vehicles, enabling users to control limited in-vehicle infotainment and HVAC functionalities with speech. Similarly, in 2007, Ford partnered with Microsoft to introduce “Sync”, permitting drivers to interact with cell phones and control music from in-vehicle infotainment or smartphone.
Further, in 2013, Apple introduced embedded Siri (CED VPA), and in 2017, Ford integrated Alexa. Along similar lines, virtual personal assistants have become an active and integral part of the connected ecosystem in current vehicles. Present-day VPAs leverage artificial intelligence and cloud connectivity to offer functionalities like infotainment controls, vehicle biometrics, navigation controls, cell phone controls, customizable wake-up words, emergency accident detection, and in-vehicle payment, among others.
While conventional speech recognition systems have been there in the industry for the past two decades, the majority of the new models sold currently are integrated with functionalities like Apple CarPlay and Android Auto, enabling VPA via CED. On the other hand, OEMs are integrating embedded VPAs and third-party VPAs to enhance user experience.
VPA developers and integrators also aim to improve road safety by decreasing overall driver distraction caused while using functionalities like in-vehicle infotainment, navigational commands, and cell phone, among others. In 2020, Transport Research Laboratory published a report gauging driver distraction while interacting with Android Auto and Apple CarPlay. The result of the study indicates, driver distraction levels while using voice assistants are much lower when compared to using touch screens. With driver distractions estimated to be one of the prominent causes in ~30% of vehicle collisions across Europe, the usage of voice assistants can significantly improve road safety.
Owing to the improvement in user experience and increased road safety, global VPA penetration in new vehicles is anticipated to continuously increase and reach ~90% by 2028. Additionally, VPAs are witnessing strong traction from consumers highlighting strong demand dynamics. In 2019, a consumer survey conducted by Capgemini illustrated ~37% of the global respondents were willing to pay extra to VPAs in automotive. Respondents from India (68%) and Norway (39%) illustrated the highest willingness to pay for VPA.
Using automotive VPAs is different from those designed for homes. The in-car environment can change with the usage of a vehicle’s HVAC system and with the increase in vehicle speed leading to increased background noise. Background noise, distance from the speaker, and reverberation are prominent challenges associated with VPA performance.
Background noise: Noise such as vehicle engine noise, road noise, and in-vehicle HVAC noise, among other unwanted sounds can be classified as background noise. Background noise increases the frequency of signal content and can negatively affect overall voice recognition performance.
Distance from the speaker: When sound travels from a speaker’s mouth to the microphone, it experiences various transformations when reflecting off surfaces. The most prominent change is the reduction in overall signal level, leading to changes in the quality and intelligibility of speech.
Reverberation: The combination of direct sound and multiple reflections from walls, ceilings, and other surfaces results in noisy, echoey, or reverberant speech. Reverberation can impact overall VPA performance when VPAs are designed in a reverberation-free environment.
Players like SoundHound, Cerence, Yobe, and Kardome, among others, are using techniques like beamforming-usage of multiple microphones, noise cancellation algorithms, voice identification, the addition of zone control with source separation, and deployment of Echo cancellation, among other techniques to address these challenges.
Design approach: VPA technology providers and suppliers consider the impact of distance, background noise, and reverberation on voice recognition while designing and testing VPAs. Including more data about the wide range of disturbances that can occur while using voice systems in cars during the design phase can provide accurate data to train neural networks of the VPA system.
Beamforming (Using multiple microphones): With microphones pointed toward the speaker, noise, and reverberation from other angles toward microphones are lowered. To reduce the effect of background noise, players in the VPA domain are using multiple microphones to form an array and allow the speech recognition system to focus on a single direction and reduce background noise from other directions.
Zone control with source separation: Separate zones allow VPAs to respond to voice commands from different places in the car, the separation algorithm splits up voice signals from drivers clearly, even with passenger noises in the background. Similarly, passenger voice commands can be clearly separated from the driver’s voice using each individual microphone.
Voice identification/ voice biomarkers: Dynamic noise environments including crosstalk can interfere with the accuracy and effectiveness of voice interfaces. While voice identification technique allows VPA to identify and lock unique voice biomarkers resulting in VPA processing the voice of a particular interest (user’s voice). This technique enables VPA systems to execute highly accurate speech interactions with enhanced speech signals while keeping all other background noises/ intrusions from interrupting and affecting voice recognition.
Usage of linear noise reduction components: Usage of nonlinear noise reduction components can lead to difficulties with processing speech signals owing to speech deterioration and speech signal deletion. Signal deletion may lead to the random disappearance of a few frequencies, leading to a loss in input. To avoid signal deletion and noise reduction, linear noise reduction components can be used.
Integrating echo cancellers: During the acoustic echo, voice-enabled devices must differentiate between the user’s speech and the output from speakers. The usage of an echo canceller permits this differentiation, and the accuracy of echo cancellation contributes to overall VPA performance.
Automotive virtual personal assistants or voice assistants are becoming one of the preferred amenities of consumers when evaluating current vehicles. While customer expectations of automotive VPAs remain similar to VPAs at home, noisy vehicle conditions offer far greater usage challenges than quiet homes.
Background noise, an echo from inside of the vehicle, and distance from the microphone are prominent challenges lowering overall voice recognition performance. To match customer expectations virtual personal assistant developers are working to address performance challenges by adopting techniques like voice biomarkers, zone control with source separation, beamforming, and altering design approach, among others.
To meet rising consumer interest, the majority of the OEMs have already integrated VPAs in their production fleet by partnering with VPA suppliers, while remaining OEMs are anticipated to follow a similar trend in upcoming years. Additionally, with VPA developers working to enhance voice recognition performance and functionalities, we may expect an increasing number of partnerships between OEMs and VPA developers to integrate both embedded and third-party VPAs in the upcoming years.
Embedded VPAs may allow OEMs to customize VPA as per consumers’ needs and provide greater control over users’ data to OEMs, resulting in increased user privacy. On the other hand, third-party VPAs like (Google and Alexa, among others) may provide advantages like user-friendliness- similar home and automotive VPAs, however, lower the ability to customize voice solutions. Considering, these factors we expect upcoming vehicles to feature multiple VPAs to provide greater flexibility to consumers. Netscribes actively supports firms in the digital economy by providing a comprehensive suite of data and insights solutions.