This A/B Testing Method Changes Everything
The future of A/B testing belongs to teams with discipline. Every documented test becomes fuel for your AI advantage.
Most teams are doing A/B testing wrong.
They’re running experiments, but without a real system behind them. Tests get shipped. Results come in. But there’s no learning system underneath - no rigor, no documentation, no clear understanding of why something worked or didn’t.
Small wins happen, but they don’t scale.
The learnings can’t be applied to other parts of the business. And when a test fails, they don’t learn enough to improve their hit rate for the next time.
When I was leading growth at Wistia and Postscript, I followed a more disciplined approach.
It wasn’t glamorous work, but it turned testing into a strategic asset - something that created compounding value over time and built trust across the company.
I didn’t realize it at the time, but that rigor became the foundation for what’s now possible with AI.
This piece breaks down exactly how to do it - and how to use AI to make every A/B test smarter than the last.
Why most A/B tests fail
Most teams test ideas, but they don’t test insights.
That single distinction changes everything.
When I look back at hundreds of experiments I’ve seen over the years, most had good intentions, but not much structure.
Teams changed buttons, copy, headlines, images, or designs - hoping something would move the needle. Sometimes it did.
But more often, they got stuck around the same conversion rates. Or driving small incremental improvements (3-5% here and there) instead of big step changes. That’s because they weren’t diagnosing the problem before running the test.
They didn’t understand why users weren’t converting in the first place.
If you don’t know what you’re trying to learn, you’re not testing - you’re guessing. And guessing doesn’t scale.
Once you understand that, the next question is where to focus your tests.
Map your growth model and find the leverage
Most teams skip this part. They jump straight into brainstorming the next easy test - the one someone’s been wanting to try, or the area that feels safest to experiment in.
But the real work begins one step higher.
You have to zoom out, look at how the business actually works, and identify where improving a single number could have the biggest overall impact. That’s your leverage point.
Here’s how I’ve done it.
I start by sketching out the full growth model - acquisition, activation, retention, monetization.
Then, I layer in quantitative data to see where the growth rates - or conversion rates might be underperforming. The numbers show you the size of the opportunity.
Once you’ve spotted those performance gaps, zoom in and use qualitative data to understand why they exist.
Watch session replays. Read churn survey responses. Talk to sales, support, and customers. The goal is to uncover the friction that’s holding users back from converting.
Quantitative data tells you where the leverage is.
Qualitative data tells you what’s blocking it.
Over time, I’ve learned this step separates teams that get lucky from teams that drive real growth.
Because when you start with leverage instead of ease, you stop running random tests - and start building a testing program that compounds.
Brainstorm differently: 10% vs 10× thinking
Most teams run brainstorming sessions like casual group chats.
Someone throws out an idea, others nod, and before you know it, you’re testing the most exciting opinion in the room.
We flipped that on its head. We made it structured. Every brainstorm had five minutes of context, fifteen minutes of quiet ideation, and ten minutes of sharing, combining, and group think.
Then we added a single prompt: Which of these ideas might improve conversion by 10%, and which could improve it by 10x?
That tension changed everything. It pushed the team beyond surface-level tweaks into deeper, more creative problem-solving.
This is where creativity meets structure.
Prioritize using a rubric (ICE or RICE)
Choosing which test to run next shouldn’t depend on who talks loudest.
It should depend on the evidence.
Score every idea using the ICE framework: Impact, Confidence, and Ease. (Some teams use RICE, adding Reach - but the principle is the same.)
This removes ego from the process.
Instead of debating which idea “felt” most promising, you’ll have a clear, shared system to rank them. High-impact ideas that are reasonably easy to execute rise to the top. Harder, riskier bets go into the queue for later.
A good testing roadmap balances quick wins that build momentum with bold swings that create breakthroughs.
Design each experiment (with rigor)
This is where most teams lose the plot. They skip the brief.
Most teams think writing an experiment brief is for big companies - or that it’s redundant because they already “know” the details. Others skip it because no one’s ever asked them to, so it feels like extra work.
But this step is where the leverage lives.
By skipping it, you’re not saving time. You’re skipping the learnings that would make every future test smarter.
Later in this playbook, you’ll see why this kind of documentation becomes the foundation for AI-driven testing.
That’s why every test should have a one-page doc before launch.
It doesn’t need to be fancy, or overly complex. Include a clear hypothesis, one primary metric for success, screenshots of control and variation, sample size, duration, what you’re hoping to learn, and your next steps (based on win or lose). The most important piece is your kill criteria: decide upfront when you’ll stop or revert, so the test doesn’t drift into ambiguity.
And later in this playbook, you’ll see why this kind of documentation becomes even more powerful - because it’s the foundation that makes AI-driven testing possible.
If you want a ready-made system for this, that’s exactly what my Growth Operating System was built for.
It includes the templates, workflows, and experiment trackers I’ve used to help dozens of growth teams build structure, speed, and confidence in their process.
Log wins and lessons to turn experiments into insights
The real value of testing isn’t in the immediate results.
It’s the compounding learnings over time.
That’s why this step isn’t optional. It’s the foundation of the new AI-powered testing playbook - because the teams that document their learnings today will move faster tomorrow.
Without a record of past experiments, it’s impossible to see which ideas work repeatedly, which ones keep failing, and where new opportunities might exist.
Sometimes you’ll notice something simple - like a test that performed well in one channel that’s worth trying in another. Other times, you’ll realize you’ve been testing the same idea in circles without moving on to something better.
That’s why you need a running log.
It can live anywhere: a spreadsheet, Notion, Airtable, whatever you use. For each test, record the ID, what you changed, the result, and what you learned.
Then, every month, review your tracker to spot trends. If your system is working, you’ll start to see your hit rate improve - and that’s how you know your process is compounding.
And here’s where this becomes even more powerful.
Over time, this tracker turns into training data for AI. You can feed it into your LLM to help predict which future experiments are most likely to win, based on patterns from the past.
That’s why this step isn’t optional—it’s the foundation of the new AI-powered testing playbook.
The teams that document their learnings today will move faster tomorrow.
Use AI to shorten the learning loop
Some teams are using AI to replace experimentation.
That’s a huge mistake.
They’ll open ChatGPT and ask, “How can I improve conversions?” and get a list of generic ideas - copied from different markets, business models, audiences, and contexts.
So even if AI gives you an answer, it’s really just guessing. It doesn’t have a dataset for the moment you’re in… yet.
The real power of AI isn’t to replace experimentation.
It’s to make it faster.
When you train AI on your own data - your tests, your users, your model, your moment - you give it the context others are missing. That’s how you turn AI from a guessing machine into a growth engine.
I’ve been doing this for the past year, both in my own business and with coaching clients - and the results are wild. AI doesn’t always get it perfect, but it spots patterns in seconds that used to take hours.
Here’s how to do it…
When you plan a new batch of tests, pull up your wins/losses tracker and feed it into ChatGPT. Use a prompt like:
“Here are our last 80 experiments: what we hoped to accomplish, where we ran the test, what we wanted to learn, and what happened. Given this history, which of these five new ideas is most likely to win, and why?”
Growth teams historically “hit” on one out of every four tests - 25%.
In my experience, AI predictions are right about 65% of the time - and they surface those insights in seconds, not hours/days.
That means fewer wasted cycles, faster iteration, and a team focused on what really moves the needle.
AI is only as smart as what you feed it.
That’s why rigor matters. Every experiment brief, every documented metric, every logged learning - it’s all data that AI can use to accelerate your next breakthrough.
Most teams run experiments. Few have a testing system
They launch tests, watch metrics, and move on.
But without structure - without clear hypotheses, documentation, and reflection - they’re just collecting outcomes, not insights.
Structure is what turns testing into an advantage. It’s what makes the wins repeatable, learnings transferable, and growth scalable.
And now, it’s also what unlocks the next playbook.
Because in this new era, the teams that build disciplined systems today will be the ones who can train AI on their real context tomorrow.
That’s where the compounding starts - where every test makes the next one smarter, faster, and more likely to win.
—
And if you want to build this kind of rigor into your own growth process, my Growth Operating System will help you do it.
It’s a plug-and-play framework for running faster, smarter experiments - with all the structure you need to turn testing into a real advantage.
…
(this post was originally published on my site - https://deliveringvalue.co/growth-essays/new-ab-testing-playbook-ai-era)


