How Posh AI built a banking voice agent that callers actually thank

Challenge

To support its rapid growth and increasingly complex banking flows, Posh AI sought a world-class infrastructure partner to offload foundational audio handling, allowing their engineers to double down on Posh’s industry-leading, banking-specific AI.

Solution

Posh offloaded audio streaming, speech-to-text, text-to-speech, and noise handling to Twilio’s ConversationRelay, freeing its team to focus on building intelligent, domain-specific banking flows.


For millions of Americans who bank with credit unions, community banks, and other mid-size financial institutions, calling customer service can be painful. Posh AI, a Boston-based agentic AI platform purpose-built for banking, wants to make it wicked good.

Founded in 2018, the company deploys an AI workforce for community banks and credit unions handling everything from balance inquiries and loan application status to card activation and routine payments.

Posh built a team that deeply understands this industry, and quickly became the go-to platform for financial institutions and their customers alike. The company powers 40-50 million conversations per year across its client base. 

Along the way, Posh discovered that scaling a voice service savvy enough for sensitive financial interactions takes a lot of resources. It also creates a lot of headaches. That’s because in banking, there is no margin for error. For decades, the industry's answer has been containment-focused IVR automation that frustrates customers and creates long wait times for the human agents handling more delicate calls.

Posh’s approach pivots to resolution over containment.

By partnering with Twilio and choosing ConversationRelay to power voice, Posh found a way to pursue genuine resolution while freeing its engineers to focus on what truly differentiates the company.

Talk isn’t cheap. It’s challenging.

“As Posh AI scales to power millions of conversations, the team chose to offload foundational audio infrastructure to Twilio, allowing 100% of their engineering focus to remain on Posh’s proprietary, banking-specific AI,” says Greg Montemurro, Staff Product Manager at Posh.

For years, Posh ran its voice platform on a homegrown audio media service. It worked, but came with a cost that didn’t appear on any invoice: engineering bandwidth.

When something went wrong on a call, the team had to untangle whether the issue lived in the speech-to-text layer, the AI logic, an integration, or the infrastructure itself. Those errors carry real weight, especially in banking, where one mistake can lose a customer, or even worse—create a liability. This got Montemurro thinking about how offloading voice might be the most elegant way forward.

“If we were to take that off our backs, imagine all the FTEs we could free up to actually focus on our core differentiation,” Montemurro explains.

The reality: every engineer maintaining audio infrastructure was an engineer that’s not working on the banking-specific AI that makes Posh’s product pop. There was no time to waste. Montemurro and his team took a good look at Twilio’s ConversationRelay and knew it was more than just a voice product. It was the path to innovation. 

"Our key differentiation isn’t literal audio infrastructure. It’s applying our deep domain knowledge of banking and fintech to serve end users."

Greg Montemurro Staff Product Manager, Posh AI

Offloading audio, unlocking intelligence

It made sense, but would it work? 

For Montemurro, the decision to purchase ConversationRelay was a no-brainer. ConversationRelay could handle all the “audio stuff”—streaming, speech recognition, text-to-speech, noise cancellation, etc.—while  Posh channeled maximum effort into making the most advanced, regulated, domain-specific banking AI on the market. 

“The CR team has a great product that is able to do what we needed it to do and let us focus on our key differentiation,” says Montemurro. “We ended up scaling all of our audio streams to it.”

The improvements were immediately evident: 

  • Speech-to-text accuracy increased, meaning callers were more likely to be understood the first time. 

  • Turn-taking became more natural. No more bots talking over callers or awkward silences while the system decided if someone had finished speaking.

  • Text-to-speech voices became what Greg calls “significantly more beautiful,” elevating the entire feel of the interaction.

But the most significant impact was in the additional benefits ConversationRelay unlocked. Posh could now deploy more advanced Operating Procedures for orchestrating complex banking flows. Instead of waiting for an entire AI response to finish processing before speaking, the system could begin sending audio while still thinking, making conversations feel faster and more human.

Beyond technology, Twilio’s ConversationRelay team is “probably the best partners we’ve ever worked with,” according to Montemurro. “Anytime we run into something that we're really stuck on, someone [from Twilio] jumps on [to help].” He continues: “A lot of times you buy something and then that one bug you can’t solve results in a week of work. It’s nice when that isn’t happening.”

"It feels great to build a product that people have wanted for so many years. We’re actually making people’s lives better versus just deflecting calls."

Greg Montemurro Staff Product Manager, Posh AI

From deflecting calls to delighting customers

By design, Posh doesn’t embed post-call satisfaction surveys into its voice experience. But the team discovered a metric that speaks volumes: how often callers say “thank you” to the bot before hanging up. Since moving to ConversationRelay and launching its enhanced AI layer “Operating Procedures”, that number has grown four to five times over.

“Think about the last time you said thank you to an AI assistant,” Montemurro says. “People don’t really do that, but they do it when they’re pretty happy.”

With audio infrastructure off their plate, Posh’s engineering team can put their money where their mouths are, making the AI smarter about banking. The “Operating Procedures” framework lets the team build flows naturally while keeping every action deterministic and compliant. The AI converses like a human, but under the hood, every account lookup, card activation, and balance check follows strict, auditable logic.

With Twilio, Posh has reclaimed the engineering bandwidth necessary to innovate within high-stakes banking use cases. Leveraging ConversationRelay as a foundation, the team is now pioneering creative features such as rotating agent voices to ensure every customer interaction feels fresh and localized. These Twilio-enabled breakthroughs are successfully reframing automated banking from a robotic hurdle into a genuinely personal, resolution-driven experience.

Better bank calls start here

With ConversationRelay, Posh is building something the banking industry has never had heard of: resolution-focused voice automation that people love.

And the opportunity, Montemurro believes, extends beyond older demographics that have traditionally relied on phone banking. If the experience is natural enough and smart enough, there’s no reason younger generations won’t call their bank, too. 

“What if you called your bank and it just worked? I actually believe there would be a higher propensity to use these products if they worked the way we’re building toward,” Montemurro says confidently.

One call at a time, Posh and Twilio are turning banking’s biggest bottleneck into a praiseworthy experience.


Ready to get started with Twilio?