Whether you are skeptical or not, revolutionary speech technology has already been implemented in our everyday lives.
Speech technology is everywhere: it's already having a decisive impact on our personal and professional lives. This rapid adoption goes way beyond asking Alexa to add milk to your shopping list: from calls to your bank, transcribing Zoom calls, subtitles for the deaf community and beyond, flying fighter jets, transcribing 999 calls to police interviews and court proceedings. The market continues to expand as more use cases emerge across a wealth of industries, with speech technology swiftly becoming deeply embedded across society.
But with all these perks, many enterprises still hesitate to implement innovative speech recognition. To find out why, the Cybernews crew reached out to Katy Wigdahl, CEO of Speechmatics – the most accurate and inclusive speech-to-text engine in the market today.
How did Speechmatics originate? What has your journey been like throughout the years?
Speechmatics’ story began back in the 1980s when our founder, Dr. Tony Robinson introduced the approach of applying neural networks to speech recognition, demonstrating that neural networks greatly outperform traditional systems. Today’s computing power, along with the rise of graphics processing and cloud computing, now makes the huge potential of this approach a reality and we believe we will soon reach the point where computers truly understand us.
Carrying on Dr. Robinson’s legacy, we are pushing the boundaries of Autonomous Speech Recognition in over 34 languages and we are also working with some of the largest blue-chip companies in the world, to change the way companies work with accuracy and speed.
Can you introduce us to what you do? How is AI incorporated into your solutions?
Our API provides customers with the most accurate speech-to-text powered by AI. Speechmatics aims to increase inclusivity in AI technology and lower AI bias until all voices are understood.
As innovators, we build systems and models to improve on our already high accuracy levels. This is why we're focused on making our Autonomous Speech Recognition (ASR) the next big thing. ASR has proven to be a major breakthrough in the field that saw us significantly outperform tech giants such as Amazon, Google, Microsoft, and Apple in tackling inequality and AI bias by introducing self-supervised learning to train our models faster. Essentially, this machine learning method means our AI model can learn from much wider audio datasets as it doesn’t rely on exclusively using data labeled by humans, which is costly, time-consuming, and far from comprehensive. Instead, it uses some property of the data to construct a supervised task, without the need for human intervention.
Traditional speech recognition – also known as ‘Automatic’ Speech Recognition – relies on consuming large quantities of audio data, but this has typically been achieved using narrow datasets, all of which have been meticulously labeled by humans.
However, our ASR, which uses the latest techniques in machine learning, including the introduction of self-supervised models takes accuracy to the next level. The result is the most powerful, inclusive, and accurate speech recognition ever released.
In your opinion, which industries should be especially concerned with implementing speech recognition solutions?
Since Speechmatics was founded we have seen a huge appetite for our technology which can be used in a variety of commercial scenarios, including media & entertainment; contact centers, customer relations, business intelligence; financial services; government; and edtech. For example, speech technology can be used in media monitoring to set live triggers on chosen keywords or used for real-time and pre-recorded subtitling. Turning speech into text also enables contact centers to analyze their audio content and understand the mood, tone, and overall sentiment of customers, supporting continuous improvements in customer experience.
For the broadcast industry, speech recognition solutions are particularly valuable because regulations require that content is made accessible for all and accurate, real-time transcription is key in enabling this.
We recently added Entity Formatting functionality to our Autonomous Speech Recognition (ASR) software, which is bringing us and our customers closer to tackling one of speech recognition's biggest challenges: interpreting numbers. This feature is especially useful in industries where getting numbers right in the transition from speech to text is essential. This software uses Inverse Text Normalisation (ITN) to more accurately transcribe numbers, including percentages, dates, and times, as well as currencies. For example, this means that the software will be able to differentiate between ‘pounds’ as the currency and ‘pounds’ in terms of weight.
How did the recent global events affect your field of work? Were there any new challenges you had to adapt to?
Two years of pandemic and the resulting lockdowns have completely changed the working environment. Despite lockdown restrictions being lifted, many businesses continue to implement flexible and hybrid working arrangements. Teams continue to be dispersed and online channels are used to communicate all types of information, from casual daily chats to sensitive information.
This resulted in an increased demand for speech recognition technology as it was used to fortify product offerings and services to meet the challenges created by the pandemic. Research conducted for our 2021 Trends report found that 93% of respondents believed that the demand for collaboration tools would continue after the pandemic.
When the COVID-19 pandemic hit, we continued to follow the business plan we had set for Speechmatics. Since we had a clear vision for the company, we all had a ‘north star’ that was understood and easily followed by any member of the team. Thankfully, we have always had a flexible approach to how we deliver that vision, so when the pandemic hit and the world seemed more unpredictable than ever, we leaned on this flexibility to keep us motivated and united.
When facing such unstable circumstances, good leadership teams need to become laser-focused. Limited resources and/or market uncertainty means there is little room for error and gathering the right data from across the business to make better decisions should be a priority. We have always been proud of our transparency around what is going on in the business, but we felt that during such uncertain times it was even more important to do so: our employees were already managing to live with restrictions in their daily lives and we did not want the status of the business to add to their anxieties.
Why do you think companies often hesitate to try out new and innovative solutions that would enhance their business operations?
Integrating a brand new external software into a company’s existing solution is a daunting prospect for many companies and often businesses also don’t fully understand if a software tool is for them, or that the promised solution will be helpful until it has been tested.
In the speech recognition industry, how we define and measure accuracy is a contested issue, which in turn influences companies’ attitudes towards adopting the technology. Indeed our 2021 Trends report showed that 73% of respondents found accuracy is the biggest barrier when it comes to adopting speech technology to their business. 51% thought accent or dialect recognition issues were a barrier.
However, over the last few years, speech technology has improved to a point where the most spoken languages in the world - namely English, Spanish, French, and German - are now understood as highly accurate in terms of word error rate (WER). As the evidence mounts for the value of integrating new technologies, like speech recognition, we expect that those who are hesitating to take on innovative solutions risk being left behind.
As the world gets more connected, what security tools do you think every company and individual should have in place to keep themselves safe?
Speech recognition technology can play an important role in security operations. On-premises deployments are the most secure way of consuming the technology: it allows customers’ data to be consumed and stored in their own data center, reducing the risk of data breaches that could be incurred with offsite storage.
There is inherent security value in getting voice data out of audio and videos and into written form. It helps companies analyze transcripts for irregularities and allows them to be proactive in mitigating potential breaches.
In your opinion, where can we expect to see speech recognition solutions be used more often in the near future?
As humans, we are designed to communicate through voice. Using speech recognition technology in business is already adding great value by either optimizing the conversation or improving the workflow around it. Most people, be it business, people or consumers have encountered speech recognition in some form, for instance, through their bank or mobile phone.
In addition to boosting productivity, we will also see speech recognition solutions be used a lot more to increase accessibility and inclusivity in a wide range of businesses and services.
What other aspects of our daily lives do you hope to see improved by new advancements in technology?
AI can be applied to almost any aspect of our lives, from insurance to hospitality. It’s not a technology we should be afraid of but one we should embrace as it is designed to enhance our lives. Incorporating AI into different sectors means that customers get a better experience and often even better outcomes, as humans are freed from repetitive mundane tasks and have more time to focus on activities that require a human.
Using AI in insurance or healthcare can mean that conclusions are reached quicker, be it around the customer’s policy or which treatment is best to help a patient. The speed at which AI takes in the information, analyses it, and then considers all variables can make a hugely positive difference in how we reach conclusions.
Would you like to share what’s next for Speechmatics?
Our plans for the future are to continue to grow and take on the big tech players while reducing the bias in AI. Even though we only launched our Autonomous Speech Recognition technology less than a year ago, we will continue to test ourselves against our competitors while innovating to ensure we deliver the best speech recognition in the market.
We will be putting a particular focus on the accuracy of transcription for all voices and in all languages, so that speech recognition doesn’t just work for some, but for everyone.
We will be announcing some exciting projects and new customers in the next few months, so stay tuned!