MVP Engineering2024-07-049 min

Shipping an AI MVP in 4 Weeks: The UltraMVP Playbook

Most AI projects fail not because the model is wrong but because the team spends 6 months building infrastructure for a product they have not validated. The 4-week MVP exists to flip that order: prove value first, then scale. This is the exact week-by-week playbook we run across 60+ AI MVPs.

[ TL;DR ]

[ 01 ]

Week 1 — Discovery and the success contract

We refuse to write code in week one. The team spends five days mapping the user journey, the data we can actually access, and the single metric that defines success. The output is a one-page ‘success contract’ signed by the founder, the user champion and the engineering lead.

If we cannot agree on a metric in week one, the project is not ready. Postponing the build is cheaper than shipping the wrong thing.

Map 3–5 real user workflows with timestamps and pain points
Audit available data: schemas, volume, freshness, PII risk
Pick one primary metric (e.g. ‘median time-to-resolution under 90s’) and one guardrail
Choose the smallest model that could plausibly meet the bar

[ 02 ]

Week 2 — Vertical slice, end-to-end

Week two ships a working vertical slice in production-shaped infrastructure: real auth, real database, real model API, real observability — but only one user flow, with rough UI. We deliberately avoid abstractions, queue systems and microservices until we have proof of value.

By Friday of week two, a real user runs the flow and we capture the first eval data. Almost always, this is where the actual product redesigns itself.

[ 03 ]

Week 3 — Evaluation, hardening and the second flow

We instrument the eval harness, lock the primary metric measurement, and start a daily regression run. The model, prompt, retrieval and tools are now iterated against numbers, not vibes.

We add the second user flow and the first guardrails: rate limits, cost caps, PII redaction, audit logging. The system stops looking like a prototype and starts looking like a product.

[ 04 ]

Week 4 — Production launch and handover

Week four is launch and handover. We finalize the deployment pipeline, write the runbook, train the customer team, and run a controlled rollout — usually 10% of users, with an explicit rollback trigger.

By the end of week four, the customer team owns the system. We stay engaged for two weeks of post-launch support and then transition to optional retainer for new flows.

[ 05 ]

What we deliberately leave out

Multi-tenancy, advanced fine-tuning, custom infra, multi-region deployment, full design system. These are scale-stage problems. Building them in week one is the most common reason AI MVPs ship in month nine.

[ Key takeaways ]

01Refuse to build until a one-page success contract is signed
02Week 2 ships an ugly end-to-end slice on production-shaped infra
03Eval harness goes live in week 3, before any further tuning
04Launch behind a 10% rollout with an explicit rollback trigger

[ FAQ ]

Frequently asked questions

What if our use case really cannot ship in 4 weeks?

Then the right move is to slice it. We will deliver a 4-week MVP of the riskiest 20% — usually the model or the data — and stage the rest in subsequent 2-week increments.

Who owns the code at the end of the engagement?

You do. Repository, infrastructure, model adapters, prompts and evals are transferred to your accounts on day one of week four.

Do you build full design alongside the MVP?

We ship product-grade UI but not a full design system. If the MVP earns its budget, we expand design in the post-launch phase.

[ Start your build ]

Start your 4-week AI MVP

Lock in discovery next week and ship a real product to real users 28 days later.

Book your kickoff