Conversion Rate Optimization for Email Marketers

AB Testing Email Marketing: What to Test & How

AB email testing is a must-have strategy in your marketing program. Marketers and consumers are becoming more sophisticated. The old tactics of blasting emails out to your audience no longer works. You need to make sure that your email subject and message are optimized for your audience.

a/b testing email

What is AB email testing?

AB email testing is an easy process to understand:

  1. Create a hypothesis. (Will Subject A get more opens or Subject B?)
  2. Send half your audience Subject A and half Subject B.
  3. Measure how many clicks each subject gets and determine if one subject is better than the other.

There’s a lot to test in your emails: subject line, message, call to action, buttons, videos, images, and anything else you can think of.

When you are AB testing emails, you want to make sure you are only testing one thing per email, though. Otherwise your variables will get mixed up together. If you are testing both the button text and a new CTA, then you won’t know which one caused the click through rate to increase.

You can also send more than just two variations. Two variations is, technically, called an “A/B test.” More than two is called an “A/B/n test”. “n” indicates that there are more than two variations.
You might also hear AB testing called split testing.

How to start AB Testing Emails

Testing platforms

To start running AB tests for your emails you should first check to see if your email marketing platform offers a testing tool. Many do! Here are links to some popular ones:
ConvertKit (Subject lines only as of 2023)
MailChimp (all paid plans)

If your platform isn’t in this list just Google the name of your email marketing platform and “ab testing”. You should be able to find out if they offer a split testing tool.

Without a testing platform

Even if your email marketing platform does not have an AB testing tool, you can still do AB email testing. There’s just an extra few steps.

Method 1

You will need to take your email address list and split it into two random lists. To do this I normally download my entire list, sort by something random like alphabetically by email address, and then create two new lists, each with half of the email addresses. Once I have those two lists I upload them into the email marketing platform as Test List A and Test List B.

Then I create two different email campaigns. They should be exactly the same EXCEPT for the one variable you want to test. Then send the first campaign to the first list and the second campaign to the other list.

Method 2

You could also add a custom field and assign either A or B to each record. Then set up the two campaigns and send the first to everyone with A in that field and the second to everyone with B in that field. Whichever way works better for you!

Analyzing Email Test Results

Unfortunately email A/B testing tools are pretty far behind website testing tools when it comes to validating results. If you want to learn and get the best results then you’ll need to understand a bit more about analytics and evaluating results. Don’t worry, it’s not too hard!

Why valid testing is important

If you have a coin nearby, flip it 10 times and count the number of heads and the number of tails. It’s pretty unlikely that you got 5 heads and 5 tails, even though the chance of getting heads or tails is 50% on each flip.

If you got 6 heads or even 7 or 8 heads would you conclude that you were more likely to get heads on any given flip than tails? Probably not. You just instinctively know that sometimes the world is a bit too random to guarantee an equal number of heads and tails on any set of flips.

Split testing is like that, too. With too small of a sample you are just going to get a random result. If you are making decisions based on random results then you might as well just not test because you aren’t getting any benefit. You aren’t learning anything and there’s only a slightly increased chance that you’re picking the better variation each time.

Here’s a screenshot of a recent test I ran on a landing page. It started out with the blue version winning. But, as it gained more test participants, the results flipped and the orange variation ended up winning with about 80% confidence.

Side note: I normally shoot for 90% confidence as a rule of thumb but since the orange variation was the control I was willing to end the test and say that there was not significant proof that the variation was better and to leave the control

How many test participants do I need?

So, you need enough people (emails) in your test to make sure that your results aren’t just random, like our example coin flip. But how do you figure out how many is enough? You’ll need a couple pieces of information and Evan Miller’s Sample Size Calculator:

Your baseline conversion rate – This is your average open rate or click through rate, whichever you are testing.

Minimum detectable effect – This is more complicated. The more test participants (emails) you have, the smaller an effect you can detect with testing. Let’s say you set your minimum detectable effect at 10% but the results show 5% more opens with Subject B. Then you wouldn’t be able to tell if that effect was valid or not. For this I will often set a check-in if I can. I’ll send out enough emails to detect a 10% effect. If the results show a 10%+ effect then that’s great! I’ll send the rest of the emails with the winning subject line. If it’s less than 10% then I will send more emails with the two different subjects and check again.

sample size calculator for email testing

Note: Checking in at intervals is not a best practice. It can produce more false positives. Then why do I do it? Because this is business, not pharmaceutical development or clinical trials. If I get an extra 1 false positive for every 20 tests then no one dies.

Testing if the A/B email test results are valid

Ok, so you’ve run your test set. How do you know if the results are valid? Luckily for us, CXL developed a calculator. It was created for website testing but, with a little interpretation, we can make it work for email testing.

  1. Enter the test duration as 1
  2. Enter version A into control (sessions or users = emails sent, conversions = measured event such as opens or clicks)
  3. Enter version B into Variation 1
  4. In step 3, change the confidence level to 90
CXL analyzing email testing results

If everything’s red (like above) then it’s not a statistically valid result. If it’s green then your result is valid.

At the bottom of step 3 you’ll see “Required sample size per variant”. That shows you how many emails you would need to send for EACH variation in order to validate the result you are seeing.

What if I need way more emails than I have?
Move on. Try testing bigger differences. The larger the change you are testing, the larger results you’ll likely see. If you’re testing one word it’s probably not making a measurable difference. If you’re testing totally different emails then you’ll probably start seeing some meaningful results.

Tools that validate for you

I have not seen an A/B email testing tool that does correct validation. But, just in case one is out there that I missed or one comes along, this is what you could expect to see:

test validation

It gives you the emails, the conversion events (opens or clicks, depending on the test), the probability to be best (shoot for 90% for a winner), and something called a confidence interval.

Confidence interval

The confidence interval is my favorite metric and the one most people don’t understand. I’m about to complicate things, so skip this if you’re just ready to get started. Otherwise buckle up and stick with me! A confidence interval shows you where the actual difference between the two variations lies.

What? You thought the difference between the two in the test was the actual difference? That would be way too easy for math. Remember the coin flip example? Even if you flipped a coin 10,000 times, it still wouldn’t start coming up heads 5,000 times and tails 5,000 times. According to statistical theory, you’d have to flip it an infinite number of times.

So the confidence interval gives a range of expected outcomes if the test had an infinite number of participants. This is extremely useful for making business decisions. Let’s say I only have 75% confidence. That’s much lower than the 90% I like to aim for. I can decide whether or not to implement the change based on the possible upside and downside. So if there’s a possible 25% increase and only a .25% possible decrease, then it’s almost all upside. It would make sense from a business perspective to implement that variation.

What AB email tests should I run?

The best tests have a hypothesis that can be applied widely to your marketing. That means testing general concepts instead of specifics.

Too specific: Is it better to say “home remedies” or “healing foods” in this email?
General: Does my audience respond better to emails about giveaways or discounts?

For nurture sequences you might want to get really specific. You’re using those same emails over and over again so they should be optimized. For one-off emails stick to general concepts that you can apply to the next email you send as well. Growing your knowledge-base is one of the most valuable returns on the time you spend setting up tests.

Recent Posts