Ad Testing Done Right:
A Scientific Guide From
The Pros

Brad Geddes
Brad Geddes
Co-Founder, AdAlysis

Ads are the only part of a PPC account that a searcher ever sees.

All your other settings and targets just tell a search engine if your ad should be displayed or not to a searcher. In the end, ads are one of the most important aspects of an entire PPC account.

Testing ads can increase your conversions, lower your cost per conversion, increase your quality score, and offer many other benefits across your account.

However, most ad testing is not done correctly.

This happens when advertisers don’t have a solid approach to understanding what they want to test and how to pick winning metrics based upon data and not guesswork.

In this guide to ad testing, I’ll walk you through exactly how to test your ads scientifically so you can make informed decisions about how to run and measure ad tests so you can improve your paid search accounts.

The Types of Ad Testing

There are two ways to test ads; and each has its own advantages and disadvantages. We will lay out each testing method so you can make informed decisions about how you want to test.

Single Ad Group Testing

The most common type of ad testing is single ad group testing. This is an ad test where you create two or more ads within an ad group and then pick your top ad once you receive results.

Looks like this
Looks like this

This is commonly referred to as A/B testing.

However, many companies use three to five ads within an ad group; so A/B testing is a misnomer as that implies there are only two ads within an ad group.

IMPORTANT NOTE: Single ad group testing is best when you want to find the best message for a single targeting method.

When you consider an ad group; it is comprised of two parts:

  • Targeting method
  • Ads

The targeting method could be keywords, remarketing lists, interests, topics, or even other methods that can trigger your ad to show.

However, only a single targeting method is triggering your ads, and thus you only learn how an ad relates to that targeting method.

The advantage of single ad group testing is that you will find the best message for a single targeting method. Therefore, this is usually the best testing method for:

  • Brand terms
  • Your top keywords
  • Your targeting methods with a lot of data

The disadvantages of single ad group testing are:

  • Low volume ad groups cannot be tested
  • You don’t receive insights that can be used in other ad groups

If an ad groups does not receive a lot of data, then you will never have enough information to achieve statistical significance in your testing measurements.

In addition, since you are only testing within a single ad group; your results only apply to the ad group where the test was conducted.

If there’s something you learned about your ads and user interactions within one ad group; you can test that message in another ad group; but it may or may not be the best message in that other ad group and thus the need to test the message and not just use it within another ad group is needed.

In order to receive global insights or test low volume ad groups, we can turn to multi ad group testing.

Multi Ad Group Testing

Multi ad group testing is a testing technique where you can test a pattern, line of text, or label across multiple ad groups at once and then aggregate the data across the ad groups included within the test.

There are two main advantages of multi ad group testing.

  • Increased data
  • Global insights

Since you are aggregating data across multiple ad groups, you’ll have more data to work with, and thus low volume accounts and ad groups can be tested at scale.

Of course, you can also test your high volume ad groups with this method as well.

As you are including multiple ad groups within this type of a test, the insights you receive do not just apply to a single ad group; but to all the ad groups included within the test.

Multi Ad Group Testing Examples

For example, here are some common ideas for multi ad group testing:

  • Does using geographies in our ads increase our CTR?
  • Will using Keyword Insertion increase or decrease our conversion rates?
  • Would searchers rather see a discount or free shipping in the ad?
  • Should we use ad customizers?
  • Should our second headline include a benefit, call to action, or an authority line?
  • Should our description focus on a wide variety of products or customer support?
  • Should we use our brand in our ads?

In each of these examples, there are questions to be answered that will help you shape how you write ads across the ad groups being included in the test.

Therefore, the first step to multi ad group testing is to determine what information you wish to know.

Multi Ad Group Testing Segments

Multi ad group testing allows you to receive insights across the ad groups being included in the test, but you generally don’t want to include all ad groups in your account within the test. You should only include ad groups in the test that have similar characteristics.

For instance, a lot of companies want to use their brand in their ads. However, if some ad groups contain brand keywords and other ad groups contain non-brand keywords; you are not going to get great results as the user intent is very different searching for your brand versus a generic word.

A better hypothesis would be: Should we use our brand in non-brand campaigns?

For instance, when you consider most accounts, you have these types of segments:

  • Your brand
  • 3rd party brands
  • Generic keywords
  • Competitor keywords
  • Remarketing
  • Display keywords
  • Etc

In these cases, each of these would be a segment and you would want to examine the data differently for each of these segments.

In some cases, you might break these down even further. For instance, here’s how a remodeling company broke out their search based account keywords:

  • Brand
  • High value remodeling (room addition)
  • Medium value remodeling (bathroom remodeling)
  • Low value remodeling (new kitchen cabinets)
  • Services (painting)
  • Etc

What they learned was that high value remodeling usually requires an in-person visit and estimate.

Most service quotes start with a phone call for a general quote before an in-person visit. Once they knew this, they could start tailoring their ads and landing pages to what searchers wanted.

The Setup

Setting up multi ad group tests is not too difficult once you have an idea of what you want to test. The main trick is to be consistent in your creation of ads. For instance, these are different lines:

  • Call Us Today
  • Call Us Today!

However, you don’t always have to be consistent depending on what you’re testing. An example is testing geographies. You might have ad headlines such as:

  • Find Plumbers in LA
  • Find Plumbers in Chicago
  • Find Plumbers in London
  • Local Plumbing Services

In this case you are really testing two lines:

  • Find Plumbers in <geo>
  • Local Plumbing Services

This is a test to see if adding a geography to the ad increases searcher interaction. This could also be tested with prices versus discounts, such as:

  • Save <x %> on <product>
  • <products> from only <price>

Once you determine what you want to test and where you want to test it, then you want to create at least two ads (you can test three or four ideas at once if you have enough traffic volume) in every ad group included within the multi ad group test.

The trick is being able to then combine the data from the tests together.

Picking the Key

With multi ad group testing, you’re going to have to aggregate data across multiple ad groups.

To easily do this you can either use ad testing software, such as, or pivot tables within Excel.

Pivot tables allow you to pivot the data across all of your ads on a ‘key’.

If you’re testing the exact same line of text in several ad groups, then the key is the actual line of text that you’ve created.

If you’re testing with wildcards, such as changing the geography in multiple ads, then your key is often an ad label (to help you organize). You should use the label option and label your ads included in each test so that you can pivot the data from the label.

For instance, if you’re testing geographies vs non-geographies, then each ad group will have two ads in it.

The one ad set that includes geographies should contain a label such as ‘geo’ and the ads that don’t include geographies should include a label such as ‘non-geo’.

You can use this same testing technique for other ad types too.

For instance, you might want to test different image themes across the display network. In that case, using labels is the easiest way to pivot the data to read the results.

Once you’ve setup your ad tests, single ad group or multi ad group; the next step is understanding how to pick winning and losing ads.

The Data

Once you run an ad test, you’ll start to accumulate data. There are five steps to working with your data:

  1. Ad rotation setting
  2. Determining what metrics you’ll use to pick your winning ads
  3. Ensuring you’re above minimum data
  4. Achieving statistical significance
  5. Maximum data

We’ll walk through each step to ensure you’re working correctly with the data you’re gathering.

Ad Rotation Settings

Your ad rotation is a setting at the campaign level within AdWords or Bing. This setting determines how the search engine rotates your ads within an ad group when there is more than one ad within an ad group.

Google Ads settings
Here’s what your AdWords settings will look like

The default setting is “Optimize for clicks”. This will show your highest CTR ads the most often.

But with ad testing, you usually want to use ‘Rotate indefinitely’ as that will ensure each ad receives roughly equal exposure.

You can use another setting if you’d like; however, it will take longer to achieve statistical significance as some ads will be shown much more often than other ads, and thus I always recommend the Rotate indefinitely setting.

Choosing Winning Metrics

There are six main testing metrics you can use in your ad testing:

  • CTR: click through rate
  • Conversion rate
  • CPA: cost per acquisition
  • ROAS: return on ad spend
  • CPI: conversion per impression
  • RPI: revenue per impression

Each metric has advantages and disadvantages and we’ll walk through them all:

CTR: Click Through Rate

The advantage of using CTR as your primary testing metric is that quality score is closely related to CTR and the higher your CTR, the higher your quality score often becomes. This metric also ensures you receive the most clicks possible from your ads.

The disadvantage of CTR is that it doesn’t care about the quality of the click.

Even if you’re not measuring conversions, you usually want quality visitors (such as 3 minutes on site) over users who just leave your page right away. If you care about the quality of the visits, then you should use a metric that is based off of a conversion value.

Conversion Rate:

The ad sets the expectation for what a user will find on your landing page.

When your ads set the proper expectations, your conversion rates generally increase.

Therefore, when you’re focused on getting conversions once a user gets to your site/landing page, conversion rate becomes a good metric to use.

The primary disadvantage of the conversion rate is that it doesn’t accommodate for the total possible conversions.

As this metric only looks at the ratio of users who made it to your page against the number of conversions that occurred, it doesn’t care how many users actually made it to your website. This is generally a good metric to use in testing landing pages, but not in ads.

CPA: Cost Per Acquisition

CPA is a good metric to use when you have absolute targets you must hit in your PPC account.

As this metric ensures that you are choosing CPAs in-line with your advertising goal, it’s a good testing metric to use.

The primary disadvantage of CPA is that just as with conversion rates, it doesn’t care about volume. In PPC, you only receive cost data when a user actually clicks on your ad; and thus it doesn’t care about how often a user actually clicks on your ad.

ROAS: Return on Ad Spend

When you have variable checkout amounts, such as in ecommerce, you’re often focused on the ratio between your ad spend and your revenue. Using ROAS as a testing metric help ensure that your costs are aligned with your ad spend.

The disadvantage of this metric is that it once again doesn’t take volume into account. It only cares about the money spent, which only occurs when a user actually clicks your ad.

Impression Based Metrics

The primary disadvantage of all the metrics listed (except for CTR) is that they don’t take volume into account; they only take cost or conversion data into account.

When you think of the search network, every time your ad appears you have a chance for a conversion. A user searched for something that triggered your ad.

At that moment in time you have a chance of a conversion. So every impression has a chance at a conversion.

Therefore, in order to consider volume (CTR) and conversion data (conversion rate or revenue) in a single metric for ad testing, you can use CPI (conversion per impression) or RPI (revenue per impression) as your primary testing metrics.

These are easy metrics to determine; you can just divide your conversions (or revenue) by impression to find this number. The higher your CPI or RPI, the more money you make every time your ad is displayed.

These are generally the best testing metrics to use.

Using Two Metrics at Once

The disadvantage of the impression based metrics is that they don’t take into account any hard caps you might have on your advertising; such as “you must achieve a specific CPA (or lower)” or that “you must hit a minimum ROAS goal”.

If you have hard caps on your advertising, then you can use ROAS or CPA as a filtering metric and then pick your highest CPI or RPI ad as your winner.

Once you’ve established how you’re going to pick ad winners, before you can actually pick them, you must hit minimum data amounts.

Confidence Levels & Statistical Significance

A confidence level is the percentage chance that an outcome was not due to chance, therefore giving you statistical significance when it comes to your ad testing.

For instance, if an ad test has a confidence level of 80%, then there’s a 20% chance that the outcome is due to chance and not due to a repeatable pattern.

So before you decide to take action on any ad test, you should ensure that you’ve achieved sa confidence level of at least 90%.

There are a few ways to determine statistical significance:

  • Use an online calculator (like this one)
  • Use an excel plugin, such as Analysis ToolPak
  • Use a 3rd party system that calculates this for you automatically, such as AdAlysis (full disclosure, I own this tool and therefore, it’s awesome)

In general, you don’t want to go below 90% confidence as you’re taking too big of a risk that the results are due to chance.

This handy chart can serve as an overall reference guide to best practices for determining your minimum confidence levels.

Term TypeMinimum Confidence
Long Tail Keywords90%
Mid data terms90% – 95%
3rd Party Brands You Sell90% (small brands) to 95% (large brands)
Top Keywords (the ones you watch daily)95% – 99%
Your Brand Terms95% (unknown brand) – 99% (well-known brand)

The main problem with confidence factors is that they assume that the data you have will be similar to the data that’ll come later. The problem is that this doesn’t always apply to search.

For instance, search behavior on a Monday morning when people are starting work is very different than search behavior on a Saturday afternoon when they’re relaxing at home.

Therefore, before you calculate your confidence factors, you must ensure that you have hit some minimum data requirements.

Minimum Data

Minimum data is the absolute minimum information you must have for an ad before you decide to examine the metrics and find if you have a winner or not.

For instance, let’s say that you have these three ad tests:

  1. 1 Click, 40 impressions
  2. 5 clicks, 33 impressions
  3. 0 clicks, 24 impressions

From a purely mathematical point of view, ad 2 has achieved a 97% confidence factor that it will have the highest CTR among these three ads.

The problem with that assumption is that this data might have occurred within a single hour on a Tuesday night, as it has a total of just 97 impressions.

To ensure that you don’t have this problem of ending a test too soon, you should define your minimum data and then examine your confidence factors after each ad has achieved minimum data.

The first part of defining minimum data is to determine the type of metric you’re using for testing.

For instance, CTR doesn’t use conversion in its calculations, so there is no need to define conversions for testing by CTR.

Some metrics are ‘yes’ or ‘must use’ in defining your minimum data as they represent volume. The ‘optional’ metrics may not be necessary as they are secondary and represent action.

Lastly, timeframes must be defined for every test.

You should always use at least a week since search behavior changes throughout a week.

However, using longer time frames such as a month is completely acceptable and can help increase your confidence levels.

CTRYesOptional Yes
CPA  YesYes
Conversion rate OptionalYesYes
CPIYes OptionalYes
ROAS  YesYes
RPI  YesYes

Once you have determined your testing metrics and the minimum data you need to define, the second step is to choose the actual numbers.

The chart below is a rough guideline to minimum data. It’s OK if you want to use higher numbers.

As some accounts get millions of impressions each week and others get just a few impressions each week; it’s impossible to globally define minimum data for everyone.

However, this is the minimum recommended amount; but using higher numbers is OK if you accumulate a lot of data every week.

Low Traffic3503007
Mid Traffic75050013
High Traffic1,0001,00020
Well-known Brand Terms100,00010,000100-1,000

Once you have defined your minimum data and minimum confidence levels, you should then define your maximum data.

Maximum Data

Not every test you run will result in a winner and a loser.

In some cases, after a couple months of running an ad test, you will have less than a 90% confidence that one ad is better or worse than another one.

To ensure these non-statistically significant test results do not run indefinitely, you also want to define maximum data.

Maximum data is the most data you want before you end an ad test regardless of the results.

There are two easy ways to define maximum data:

  1. 3 months
  2. 10x your minimum data

It’s easier to just define a 3 month time frame for a test and if a test has been running longer, and it has reached minimum data, that it needs to be ended.

A more scientific method is to use 10x your minimum data. This is harder to track, and thus 3 months is an easier number to work with.

Taking Action

Once an ad test has achieved results, it’s time to take action.

For single ad group tests, your options are:

  • Pause the loser ads
  • (optional) Create a new challenger ad off the winning ad

For multi ad group tests, your options are:

  • Pause the loser ads across all ad groups
  • (optional) Replace the losing text line with a new text line to test
  • (optional) Create a new multi ad group test with a different hypothesis across the ad groups used in the test
  • Examine and make note of any insights

The last bullet is very important. 

What did you learn from your multi ad group test?

This information can be used in other places within your account or even within your landing pages and other marketing material. For instance, we often see companies use multi ad group testing in this fashion:

  • Use your best CTA in your emails as CTAs to visit a website
  • Use your best headline in your SEO efforts to increase organic CTR
  • Use benefits, calls to action, or headlines on landing pages
  • Use images that won in emails or in social media
  • Etc

As multi ad group testing gives you insights across an entire segment of your PPC account, you want to make note of those insights, share them with your team, and see how else you can use that insight in your marketing efforts.

Over To You

Ads are the only part of your paid search accounts that users actually see.

Your ads serve as the bridge between the web and your website/landing page.

A compelling ad can induce a user to cross the bridge from looking at your ad to visiting your site and ultimately becoming a customer.

However, you want to test your ads scientifically using these steps. Humans love patterns, and you’ll find patterns in random chunks of data that have no meaning. This means that many people will just eyeball the stats in an ad group and think that one ad’s data just looks better than another ad’s data and pause the loser.

Unfortunately, this often leads to account getting worse as the data was either random or there wasn’t enough data for a pattern to perform.

As ads are the lifeblood of any PPC account, you want to ensure that you’re making data driven decisions in your PPC efforts. By following the scientific guide to ad testing, you’ll ensure you’re only making decisions, and finding user insights, when the data and testing principles are sound and solid so you can continue to improve the profits from your paid search accounts.

By using KlientBoost, you agree to our Cookie Policy.