It would have been science fiction a few decades ago to converse with a machine.
Today voice-activated technology is becoming increasingly common and accessible since it delivers numerous benefits in our daily lives. However, like with any new technology, you should understand how it works and ways to use it, as well as some of its disadvantages and problems that may come with it.
How did Picovoice originate? What has your journey been like throughout the years?
I decided to start Picovoice when I was a senior engineer at Amazon. I noticed voice technology was exclusive to only a handful of multinationals. Most enterprises couldn’t afford it. Most developers didn’t have access to state-of-the-art models. You had to go through a long and exhausting sales process even if you could afford it.
I founded Picovoice to “build the developer-first platform for adding voice to anything.
Today, what developers can do with Picovoice in a day used to take months, if not years. Picovoice serves early-stage startups, Fortune 500 companies and government organizations. Equally important, we’re democratizing voice AI. Picovoice’s free tier enables thousands of developers to enjoy entirely private and custom voice experiences. It’s been a challenging but rewarding experience.
Can you introduce us to your platform? What are its key features?
Picovoice is the developer-first voice AI platform. It provides the tools for developers to add voice to anything on their terms. There are so many voice products developers can build. Think about Alexa. With Picovoice, you can have a custom assistant, “Cyber News”, for your website. You can transcribe an hour-long podcast in minutes or build a search engine similar to Google to find queries within your audio library without converting to text.
Picovoice differentiates itself with its unique on-device voice recognition technology. Voice products built with Picovoice technology are private, cost-effective, and reliable with zero latency. Its portfolio includes Speech-to-Text, Speech-to-Index, wake word, Speech-to-Intent, and voice activity detection engines. Its stack can run on anything from embedded devices to servers and web browsers, hence providing an immersive experience not achievable by any FAANG.
What would you consider the main challenges developers run into nowadays?
Regarding voice AI, I think developers don’t know what they can achieve with voice. It’s totally fair because Big Tech didn't offer much. For example, today, if you want to change the colour of your website, a web developer can do it in minutes.
However, if you need to iterate your voice model, for example, add another wake word, training a wake word can take weeks. Same with privacy, we had a conversation with Dr. Lam from Stanford University on this subject. Most developers and consumers don’t know we can have both convenience and privacy.
Voice AI models can be both accurate and efficient. Our market is dominated by Big Tech. Since Big Tech doesn't offer accurate, efficient and private voice models, most developers don’t know what is technically possible. Today, adding voice to anything on your terms is possible. One does not need to sacrifice privacy, accuracy, reliability or bear high costs to build voice products.
How did the recent global events affect your field of work? Were there any new challenges you had to adapt to?
COVID-19 has forced everyone to reevaluate their priorities and way of doing business. Telehealth, caregiving services (such as eldercare), and voice-enabled touchless interfaces have gained popularity. Enterprises have shown more interest in industrial applications to improve labour productivity, given the staff shortages in certain industries such as warehouse management or hospitality. We’re talking to each other, doing business via online platforms, and consuming more content on streaming and social media platforms than on traditional channels. Enterprises now have more audio data that can be turned into insights and used to improve the experience by transcribing and indexing them.
Voice AI might be new to many, but it’s more than just smart speakers or simple dictation. Recent events advanced the adoption of voice AI. We’re heavily investing in our team and technology to keep pioneering innovation in the market.
Why do you think companies often hesitate to try out new and innovative solutions that would enhance their business operations?
When you ask anyone in tech, you’d hear generic answers related to infrastructure, employee readiness, priorities, and so on. However, for voice AI, we have an interesting one, too: stigma. Voice technology is not the same as it was five or ten years ago. Yet, as humans, we anchor ourselves to previous experiences. Companies do not make decisions humans do.
Remember the early conversations with Siri ten years ago? Even today, we sometimes wait for Alexa to play the next song for a minute or two. Additionally, Big Tech has a controversial history with privacy. Unfortunately, they’re not alone. Recent incidents have shown transcription startups “eavesdropping” on conversations too.
I wouldn’t want to risk my customer data or trade secrets either. Plus, it is expensive. When people hear we’re affordable not just by a few percentages but tens or hundreds of times, they get surprised. When you think about all of these, the stigma is there for a reason. However, it should change. We’re working hard to change it because building private, accurate and cost-effective voice experiences is possible.
As the world gets more connected, what security tools do you think every company and individual should have in place to keep themselves safe?
I’m sure people can find tools depending on their industry and needs. However, one shouldn’t forget that tools are mostly to minimize the risks, not to avoid them in the first place. For example, Google Speech-to-Text charges 50% more if you do not let them use your data. Some companies do not even provide you with this option. If you share your voice data with voice AI vendors, there is no tool to protect your data. There is a famous saying: “If you're not paying, you're not the customer; you're the product being sold.” It’s a cliche but also true. Tools can do only so much. Be skeptical, ask questions and look for alternatives.
In your opinion, where can we expect to see voice recognition solutions being used more often in the near future?
So far, few corporations have been able to afford voice technology, and even then, they couldn’t fully utilize it due to costs or lack of accuracy.
For example, think about a call center or a multiplayer online game. They generate billions of seconds per day. Monitoring and analyzing these conversations can improve the experience significantly. Yet organizations cannot simply send billions of seconds to the cloud for transcription. It’s simply not feasible.
First, it is expensive - Google Speech-to-Text charges USD 36,000 for one million minutes. Second, it hurts the environment - every time you record and transfer data, it consumes electricity, and it all adds up. A few months ago, we proved that cloud-level accuracy with local transcription is possible, eliminating all cloud-related costs and risks, such as privacy and security.
It makes voice recognition available and accessible for everyone – not just for startups with limited resources but also for large enterprises that do not have internal machine learning researchers.
Which industries do you think should be especially concerned about implementing voice recognition solutions?
Like any other software implementation with voice, you must do diligent research. Healthcare and financial services industries in countries and regions, e.g. GDPR, CCPA or HIPAA, have strict regulations. Once you pass the regulatory requirements, then you should consider vendor downtime or connectivity issues. If you are adding voice to cars, you should know that reception will be poor in some places.
If you’re adding voice to VR applications, the user will think it’s your product that doesn’t work, not your voice vendors. Of course, users. It’s hard to earn customers’ loyalty and trust, but it's easy to lose.
Would you like to share what’s next for Picovoice?
Our north star is to be the voice AI platform for developers. We are working to enable more developers and more applications. We have a packed roadmap with new products, features, and languages. Last week, we announced a startup program that offers a 90% discount after introducing the Free Tier with no platform, product, SDK, or time limit.