Kristina Shen
Ivory Tang

The 4 Waves of Voice AI

Y Combinator’s latest Call for Startups underscores the continued promise of Voice AI. Even in a crowded field, there’s opportunity for voice agents to drive value across enterprise and consumer applications. The shift is clear: voice is no longer the product—it’s becoming the enabler of broader enterprise functionality, expanding the total addressable market.

This post digs into actionable opportunities for Voice AI builders today, based on enterprise adoption patterns and infrastructure trends, across four commercialization waves:

  1. Infrastructure & Developer Platforms – Foundational models and orchestration layers for production-ready agents.
  2. Horizontal Platforms – Enterprise-ready tools requiring minimal technical lift for deployment.
  3. Vertical Agents – Domain-specific solutions targeting industry labor spend.
  4. Consumer Applications – Empathetic companions powered by hardware integration and on-device capabilities.

Call centers (with a market size of $314.5B in 2022, scaling to $494.7B by 2030) provided the earliest and most obvious foothold for Voice AI. AI-native companies like Cresta, Parloa, Sierra, and Decagon have rapidly displaced legacy IVR systems. Vertically specialized players like EliseAI have also made headway in areas like real estate.

Voice agents now deliver 24/7 coverage, reduce costs, and improve customer experiences with minimal integration. As infrastructure matures, the next growth phase centers on domain-specific agents targeting under-penetrated verticals.

Where Vertical Voice AI Still Has Room to Run

Several verticals still offer substantial whitespace for Voice AI innovation:

Logistics

The logistics industry runs on communication—much of it high-volume, repetitive, and manual. This complexity leads to more than $80B in annual payroll spent on tasks that could be automated. Startups are stepping into the void, building systems that can understand logistics-specific documentation and operations, with the long-term goal of replacing legacy TMS (Transportation Management Systems) and WMS (Warehouse Management Systems) with fully agentic platforms.

Fragmentation defines the industry. Freight forwarders, brokers, carriers, and shippers all play distinct roles, yet rely heavily on the same outdated systems. While global players like DHL, FedEx, UPS, and Maersk are deeply entrenched, they represent only a fraction of a market still dominated by mid-sized and regional operators. Integration cycles with legacy TMS and the lack of technical resources remain hurdles—but those that crack the code can move fast and gain meaningful share.

We’re already seeing it happen. Companies like HappyRobot and FleetWorks are automating broker communications with owner-operators, streamlining everything from load updates to appointment scheduling. Augment has built ‘Augie,’ an AI teammate that follows SOPs and collaborates across email, phone, and APIs. Vooma supports quoting and freight scheduling across multiple channels for shippers and forwarders. And startups like Pallet and Hemut are rebuilding the core systems from the ground up—giving themselves a platform on which agents can natively operate.

The promise? A logistics stack that doesn't just react, but anticipates. These systems can optimize resources dynamically, adapt to market shifts in real-time, and deepen relationships with partners through faster, smarter decisions.

Insurance

Manual tasks like policy renewals, claims processing, and compliance management continue to eat up time and introduce error risk. While large insurers like Allianz, State Farm, AIG, and Travelers shape the P&C market, most of the revenue still comes from smaller carriers, MGAs, and brokers—who often lack the scale to invest in meaningful automation.

Voice-first startups are gaining traction by focusing on the enterprise and mid-market. Strada and Liberate offer 24/7 call handling for sales and service, targeting larger orgs that bring bigger contract values, more volume, and better integration capacity. For startups, these customers are more lucrative—but smaller agencies can offer easier entry points.

The next frontier goes beyond call automation. Reserv and MarvelX are pushing toward autonomous claims processing and policy management. Integration with systems like Guidewire and Applied remains table stakes—but the deeper opportunity lies in expanding from voice to full-stack automation that reshapes insurer operations.

Healthcare and Pharma

Healthcare relies heavily on voice. Whether it's patients, providers, or payers—most care coordination still happens over the phone. AI is changing that.

Assort Health, a Chemistry portfolio company, builds specialty-specific voice agents that handle millions of patient conversations annually. Hippocratic serves hospital systems, while Infinitus focuses on pharmacy interactions. These tools reduce hold times, cut down scheduling friction, and improve the overall patient experience.

For complex administrative workflows—like prior authorizations, pharmacy coordination, and appeals—companies like Tandem, Squad Health, and Tennr are building voice-enabled systems to streamline everything. Voice becomes even more powerful when layered onto core platforms already solving paperwork and system integrations—helping reduce costs, boost transparency, and improve outcomes for patients and providers alike.

Manufacturing and Wholesale Distribution

In manufacturing and distribution, speed and accuracy are everything. Startups like Canals, Endeavor, and Distro are automating high-friction workflows like order entry—processing orders across phone, email, and even fax, no matter the format, and integrating into ERP systems.

Once orders are digitized, these platforms can upsell, cross-sell, and generate churn insights. Manufacturers benefit from added quoting tools, invoice reconciliation, and better document handling (e.g., BOLs). Proton.ai enhances sales through CRM automation, while Pepper focuses on food distributors with a voice-enabled inbox. Doss has built an end-to-end ERP system, combining inventory, orders, and production on a single data platform—laying the groundwork for full AI-native operations.

Home Services

For tradespeople like HVAC installers, electricians, and plumbers, the biggest pain point isn’t the work—it’s managing the business. Voice AI is starting to fill that gap.

ServiceTitan, the category-defining platform, has added Contact Center Pro to help automate booking and call handling. Newer players like Avoca, Sameday, and Netic are layering AI across scheduling, quoting, and dispatch—turning what used to be dozens of manual follow-ups into seamless, automated workflows. With ServiceTitan already embedded across the space, startups have a unique opportunity to build on top of an existing ecosystem and chip away at remaining labor costs through decision automation.

User Research

Traditional user research has long been a tradeoff: quick surveys versus slow, high-fidelity interviews. That tradeoff is disappearing.

Companies like Listen Labs and Strella now offer voice AI that replicates human-level interviews—probing, clarifying, and adapting in real time, but at scale. These tools collapse weeks of research into hours, transforming interviews from calendar-driven events into asynchronous, intelligent interactions.

This evolution is converging with the rise of synthetic research—AI agents simulating large populations to predict behavior. At Chemistry, we’re particularly focused on this area (see our previous post on synthetic research); we believe it will complement live voice AI by offering both signal and scale in product development and decision-making.

The Future is Both Local and Global

While enterprise adoption is accelerating, consumer Voice AI still faces hurdles: privacy concerns, inconsistent quality, and reliance on cloud platforms. That’s starting to change.

Local: The On-Device Renaissance

Smartphones and wearables now come with NPUs capable of running billion-parameter models offline. Qualcomm’s AR1+ and Snapdragon X chips support Small Language Models in smart glasses, and Google’s Gemini Nano runs entirely on-device in Android. Models like Whisper now run in under a second on basic handsets. This local-first architecture offers faster, private, and low-latency voice interfaces—unlocking use cases previously impossible.

Global: Voices That Work Anywhere

Compression solves the where. The who and how are next. Today’s best TTS models allow developers to instruct emotion, cadence, and terminology—making them capable of sounding human in any domain or dialect. We’ll soon see “dialect packs” that activate based on SIM card or locale, bringing localized, expressive AI voices to the mainstream.

Moats Will Be Built on Trust

With new regulations in the U.S. (FCC) and EU (AI Act), compliance is no longer optional. AI robocalls are now illegal; watermarking and transparency mandates are coming fast. Companies like Reality Defender already offer tools to detect deepfakes and embed audio provenance. Soon, compliance APIs will be as essential as SSL certificates.

Companions That Connect

The real moat in consumer AI isn’t just voice quality—it’s emotional intelligence. Metrics like LPI (Likeability Per Interaction) will determine stickiness. Companies like Sesame are leading the way with AI companions that are expressive, ambient, and memorable. Privacy-respecting edge/cloud hybrids will define the next generation of assistants—not just what they can do, but how they make us feel.

Voice AI is quietly becoming the connective tissue across logistics, insurance, healthcare, and beyond. In the enterprise, there’s a window to define new categories by building trust and integrating deeply into workflows. On the consumer side, the winners will be those who combine hardware-aware intelligence, global inclusivity, and human connection. The future isn’t just about voice—it’s about voices that work for us, wherever we are.

Authors