ok.proof

Hire developers who
get things done with AI.

Give them a real task and an AI coding agent.
Watch every prompt, decision, and iteration.

See how it works
Templates

Tests for every role

Developers

Can they ship, or just talk about shipping?

Fix a broken API with 3 failing endpointsBuild a dashboard from an empty projectRefactor a 400-line component into clean modules

Product

Can they turn messy inputs into a clear plan?

Write a product spec from raw customer researchBuild a competitive analysis from a briefCreate a launch plan page from a spec

Designers

Can they spot what's broken and make it better?

Redesign a checkout flow with 68% drop-offBuild a design system from inconsistent componentsFix a page that scores 34 on accessibility

Leaders

Can they find the signal in the noise?

Build a board update page from raw metricsWrite a vendor evaluation from 4 proposalsCreate a strategy memo from messy data

Or bring your own product.

Upload starter files from your actual product. Candidates open the sandbox with your code, your data, or your docs already inside.

api/routes/checkout.ts
14 async function processCheckout(req) {
15 const cart = await getCart(req.id)
16 // BUG: returns null for guest users
17 const total = cart.items.reduce(...)
18 return charge(total)
19 }

Your codebase

A real bug from your issue tracker. A feature from your roadmap.

Checkout — Redesign v2
Pay now
68% drop-off here
Layers
Properties

Your designs

A screen that needs rethinking. A flow with real drop-off data.

Q1 Revenue by Segment.csv
SegmentARRAcctsRenewal
Enterprise$48k12Q2
Growth$124k38Q1
Starter$8k204Q3
Agency$31k7Q2

Your data

A messy spreadsheet that needs a story. A dashboard that needs building.

Vendor Evaluation — Draft
This section needs a clearer recommendation

Your documents

A brief that needs sharpening. A proposal that needs structure.

How it works

Set up in minutes, not days

01

Describe the task

Pick a template or write your own task. Set a time limit, prompt cap, and AI model. Send each candidate a unique invite link.

New test
Title

Build a dashboard

Time

45 min

Model

Claude

02

They build with AI

Candidates chat with an AI agent that writes code and builds the project. They see it come together live as they work. No IDE, no setup — just a link.

Live session32:15

Add a chart component

Creating Chart.js line chart...

03

You see everything

Read every prompt they wrote. See how they broke down the problem and changed direction. Click through a working demo of what they built.

Review
PreviewCodeChatTimeline
Why it works

Built for how hiring actually happens

Real work, real signal

Candidates build a working project, not solve a puzzle. You see what they'd actually ship on day one.

See the thinking, not just the output

Every prompt, every iteration, every file change — captured in a full AI timeline. The process tells you more than the result.

45 minutes, not a week

Candidates finish in one sitting. No calendar coordination, no follow-ups, no ghosting.

Isolated sandbox per candidate

Each candidate gets their own Linux VM. Same tools, same starting point. No advantage from local setup or outside help.

Claude, GPT, or Gemini

9 models across Anthropic, OpenAI, and Google. Pick the one that matches how your team works.

Review on your schedule

Replay the full session, read their code, and click through what they built. No live session needed.

Templates and starter files

Start from a built-in template or upload your own files. Candidates begin with real context, not a blank page.

Send a link, that's it

No accounts for candidates, no IDE installs, no environment setup. They click and start building.

FAQ

Common questions

Take-homes take days and you only see the end result. Here, candidates finish in one sitting and you see how they got there.

No. You send them a link. They open it in a browser and start building. No account, no IDE, no setup.

Each candidate gets an isolated Linux VM. The only way to write code is through the AI chat. Every prompt, tool call, and file change is logged and replayable.

Most teams run 30–60 minute sessions. You can set anywhere from 5 minutes to 8 hours depending on the role.

Claude Sonnet, Opus, or Haiku. GPT-5.2, GPT-5.2 Codex, or O4 Mini. Gemini 2.5 Pro, Flash, or 3.1 Pro. You pick the model when you create the test.

Free during early access. Join the waitlist and you’ll be among the first to try it.