Senior AI Voice Engineer
Allup
About our client
heyLibby is a seed-stage startup on a mission to give every small-to-midsize fitness and wellness business a world-class AI team member. It was co-founded by Spencer Rascoff, the former CEO of Zillow and current CEO of Match Group. The AI works seamlessly across phone, email, text, and chat to help businesses manage their communications efficiently. The team moves fast, prioritizes impact, and works closely with top industry experts. They ship fast, learn fast, and measure themselves by the tangible impact they deliver to customers.
Why this role matters
The voice channel is the heart of the product. They need a seasoned engineer who lives and breathes real-time audio—someone who can make every call feel like a conversation with a helpful human, not a bot. This person will own the voice stack end-to-end: latency, natural-sounding TTS, crisp recognition, graceful live transfers, and bulletproof uptime.
What You’ll Do
- Architect & Own the Voice Pipeline: Design, build, and maintain low-latency, highly available services (Twilio, SIP/WebRTC, websockets, streaming gRPC) that power phone calls at scale.
- Push the Limits of Natural Speech: Integrate and fine-tune TTS/ASR models to pronounce symbols, numbers, names, and slang flawlessly. Experiment fearlessly, measure obsessively.
- Deliver Call Features Users Love: Live transfers, IVR fallbacks, appointment scheduling, smart error recovery.
- Optimize for Speed & Reliability: Track every hop from carrier to model; squeeze milliseconds, harden edge cases, and own the on-call rotation that keeps calls flowing 24/7.
- Collaborate Across the Stack: Pair with product, ML, and frontend engineers to craft APIs and in-app controls that unlock amazing end-to-end experiences.
What they’re looking for
- 3+ years building real-time audio or telephony systems (Twilio, SignalWire, FreeSWITCH, or similar)
- Hands-on with streaming/websocket architectures and back-pressure handling
- Fluency with Node.js/TypeScript or another modern backend language in production
- Proven track record of optimizing for sub-second latency and high concurrency
- Strong system-design skills: queues, backoffs, retries, observability
- Comfortable in AWS (ECS/Fargate, Lambda, DynamoDB/RDS, CloudWatch)
- Bias for shipping, ownership, and clear communication
Bonus Points
- Deep knowledge of SIP, RTP, or WebRTC internals
- Worked on carrier integrations, number provisioning, or call routing at scale
- Experience with AWS CDK / IaC and event-driven pipelines
- Background in speech science, TTS/ASR model tuning, or voice biometrics
- Early-stage startup or green-field product experience
- Familiarity with React; enough to build small internal UIs when needed
- Active contributor to open-source voice/telephony projects
How They Work & What You Get
- Autonomy & Ownership – You’re the voice authority; we trust you to drive decisions.
- Velocity – Weekly releases, minimal ceremony, rapid experimentation.
- Impact – Your code will be on the critical path for every customer interaction.
- Competitive Comp + Meaningful Equity – Grow with us as we scale.
- Flexible, People-First Culture – We work hard, but we protect family and personal time.
Salary info: $175,000 to $215,000 per year with equity.
Ready to build the most human-sounding AI voice on the market? Apply now and let’s redefine how small businesses talk to their customers.