Recently many people have asked the question, “If I strap GPT-4 onto a phone call, will it work?” Given how advanced LLMs have become, you might think the answer is a resounding yes. Spoiler alert: It’s not. Luckily, the recent developments in transcription, text-to-speech, and large language models, have enabled - for the first time - for anyone to create a rudimentary AI calling system.
Today, we will define what an AI phone call is, explore what tools and technologies are available to build one, and then discuss the practical steps required to put one into production. By the end of this guide, you’ll understand how your startup, enterprise, or small business can start integrating AI phone calls right now. And for our more technical friends, stick around to see some working code examples that you can build a custom system off of!
Robocalls are the absolute plague primarily because they sound fake and are fully scripted. When you receive or make a phone call, and a tinny-sounding voice answers, you know you’re about to waste the next 10 minutes of your life dealing with an incompetent phone system - when all you want is to speak with a helpful human.
AI agents are the antidote. Because language models are trained on a wide range of subjects and can be fine-tuned on human conversations, they’re exceptionally good at holding engaging discussions, following instructions, and offering help. When super-powered with additional context - like your purchase history and size preferences - LLMs can match the best customer support agent, salesperson, and even therapist.
Let’s break conversations down to a granular level and examine how they work. In this example, let’s say Paul is talking to Sam.
Paul: says some stuff about startups
Sam: listens to Paul’s advice
Sam: interprets Paul’s statement and generates a response (likely as a spoken tweet)
Sam: vocalizes his response
The steps to building an AI phone calling system are similar:
Unfortunately, when you chain those three components - the speech recognition, language, and text-to-speech models - the output is useless! Foundational models are ineffective until you provide them with heuristics for understanding human conversation. The next step is to add logic to detect when the counterparty finishes speaking and differentiate interruptions from affirmations (ex. “hold on!” and “hmm”). This logic sounds simple but gets complicated - especially when you throw in multiple speakers and background noise.
Yacine I. - AKA Kache on twitter - recently created Talk, an open source repository for AI conversations.
Having reviewed the processes that underly AI phone calls we can now examine a practical application. Here is an example that takes a speaker's input, processes it using an LLM, generates a response, and then converts to speech
Let’s explain that *large* block of code in simple English:
So, fantastic, let’s assume that we’ve now built an AI phone calling agent - an AI personal assistant - that can make phone calls to anyone for any task. Can my business just start using it?
Well, there are a couple of other problems you need to consider:
The combination of these two problems has prevented many of the "voicebot" tools we’ve seen to date from progressing from hackathon projects to production-ready applications deployed in the real world. If your agent is too slow or doesn’t have guard rails, customers will have naturally bad experiences. And if you don’t have a system for defining and tracking your agent’s behavior, you’ll never be able to spot and fix issues when they occur.
At Bland, we’ve built the API for AI phone calling. We use optimized foundation models, custom infrastructure, and in-house observability tools, enabling developers to easily add low-latency AI calls to both existing and new applications.
Whether you want to build an AI personal assistant, AI customer support representative, or even an AI secretary to schedule meetings, you can build it all with Bland. Click here to get started!
Serving sectors including real estate, healthcare, logistics, financial services, alternative data, small business and prospecting.
Serving sectors including real estate, healthcare, logistics, financial services, alternative data, small business.