Why Most AI Wrappers Die — And What Survivors Do Differently
Gil Feig, CTO at Merge
The uncomfortable truth about AI wrappers is that most of them are building on quicksand. They ship fast, get early traction, and then hit a wall that no amount of prompt engineering can fix.
Gil Feig has seen this pattern play out dozens of times. As CTO of Merge, he’s watched startups build integrations on top of AI APIs and either thrive or collapse — and the difference almost never comes down to the model they chose.
The Integration Trap
Most AI startups start with a single API call. OpenAI, Anthropic, Cohere — pick your provider and build a wrapper. The problem isn’t the wrapper itself. It’s that founders treat the AI call as the product when it’s actually just one component of a much larger system.
“The companies that survive are the ones that realize early that the AI model is a commodity,” Gil explains. “Your moat is everything around it — the data pipeline, the integration layer, the feedback loops.”
What Survivors Do Differently
The startups that make it past the 18-month mark share three characteristics:
1. They own their data pipeline. Instead of relying on generic embeddings, they build custom preprocessing that understands their domain. A legal AI startup that survives isn’t just passing documents to GPT — they’re building a pipeline that understands clause structures, precedent hierarchies, and jurisdiction-specific formatting.
2. They build for failure. Every AI call will eventually return garbage. The survivors build graceful degradation from day one — fallback responses, confidence thresholds, human-in-the-loop escalation paths.
3. They measure what matters. Not just API latency and token costs, but actual business outcomes. Did the user accomplish their goal? Did the AI-generated output require manual correction? These metrics drive product decisions, not vanity metrics about prompt performance.
The Infrastructure Stack That Scales
Gil breaks down the minimum viable infrastructure for an AI startup that wants to survive:
- Observability layer: Every AI call logged with input, output, latency, and cost. Non-negotiable.
- Evaluation framework: Automated tests that catch regressions when you change prompts or switch models.
- Caching layer: Not just for cost savings — for consistency. Users expect the same input to produce similar outputs.
- Version control for prompts: Treat prompts like code. Review them. Test them. Roll them back when they break.
The Model-Switching Reality
One of Gil’s most contrarian takes: plan to switch models from day one. “The companies that are locked into a single provider are the most vulnerable. When GPT-5 drops and Anthropic responds with Claude 4, you need to be able to evaluate and switch within days, not months.”
This means abstracting your AI layer early. Not with a heavy framework — just a clean interface that separates your business logic from the model call.
FAQ
What’s the biggest mistake AI wrapper startups make?
Building the entire product around a single model’s specific behavior. When that model updates or a better alternative emerges, the product breaks in ways that are expensive to fix.
How long does it typically take for an AI wrapper to hit scaling problems?
Most hit their first major infrastructure crisis between 6-12 months after launch, usually when they cross 1,000 daily active users and start seeing the edge cases that prompt engineering can’t solve.
Should AI startups build or buy their infrastructure?
Build the core differentiator, buy everything else. If your competitive advantage is in how you process legal documents, build that pipeline. But use existing tools for observability, caching, and deployment.
What’s the minimum team size needed to build a scalable AI product?
Gil suggests 3-4 engineers minimum: one focused on the AI/ML layer, one on infrastructure, one on product/frontend, and ideally one dedicated to data quality and evaluation.
Watch the full conversation
Hear Gil Feig share the full story on Heroes Behind AI.
Watch on YouTube