Why Results Don’t Simply Double After Merging a 50/50 Aampe vs. Business-as-Usual Split

Many customers begin by running Aampe alongside their business-as-usual (BAU) messaging approach. A common design is a 50/50 split: after reserving a small group of users as a global holdout, half of the remaining users receive communication from Aampe, half continue with BAU. When the Aampe half shows strong results, customers sometimes have the expectation that switching over the other half - doubling the number of users served by Aampe agents - will double the impressions or transactions.

That expectation is understandable, but it misses some important dynamics. There are structural reasons why results don’t scale linearly from a split test to a full rollout. Below are the five key reasons, followed by what can be done about them.

1. Cold Start for Personalization

Aampe is a personalization system. Its agents don’t just apply broad averages—they learn individual behavior. That means each new user entering the system begins in a learning phase. The agent has to test message timing, channels, and content combinations to discover what works for that individual.

During this period, performance is lower because exploration is happening. For most of the users who have been with Aampe since the beginning, that phase is behind them. For the newly added BAU users, it is just beginning.

What to do: Be patient. Results improve as the system learns. Early impressions and transactions aren’t the end state—they’re part of the path to steady-state optimization.

2. Different Starting Points for BAU Users

The users who were kept under BAU have a different history. They were trained, consciously or not, to expect BAU’s cadence, tone, or product focus. For example, they may have been habituated to promotions around a narrow set of product lines. That creates inertia.

When those users are moved to Aampe, the agents have to work harder to overcome these patterns. They may need to redirect attention, rebuild responsiveness, or break old expectations. That takes time, and in some cases the habits may be too ingrained to change. That is one of the costs of doing a split test.

What to do: Again, patience helps. Some users can be “re-trained” over time. Others may not move as far. The key is to understand that a history of BAU interactions makes those users systematically different than the users who were always on Aampe. They start out qualitatively different from anything Aampe agents have seen up to that point, and many of them may remain so.

3. Seasonality and Timing Effects

The earlier split test didn’t happen in a vacuum. It occurred in a particular season, with specific promotions, pay cycles, and external conditions. Some of the apparent Aampe uplift was due to Aampe itself, but some was due to those seasonal or random factors.

When the system goes to 100%, the season may have changed, promotions may have ended, or baseline demand may have shifted. That creates a natural difference in observed results.

What to do: Nothing. Seasonality is a real feature of markets. The best way to handle it is to measure performance over a long enough horizon to smooth out these shifts, rather than expecting short-term numbers to extrapolate cleanly.

4. Content Inventory Limitations

Aampe’s agents can only recommend from the content you make available. If that set is narrow, performance will plateau because the system has fewer options to match with user preferences. When the user base doubles, the odds increase that some users won’t see content that speaks directly to them, which dampens results.

What to do: Expand the variety of content. Think about covering different product lines, customer motivations, and use cases. More diversity in content gives agents more “levers” to test, learn from, and optimize. This is the single most important thing you can do - more on this below.

5. Regression to the Mean

Pilot tests often look unusually strong. By chance, the conditions of the split—user composition, timing, noise—can create a result that’s above the long-run average. When you move to full rollout, the numbers naturally “settle down” toward their true average.

This isn’t unique to Aampe. It’s a statistical principle seen in any controlled test. The initial jump looks bigger than what will persist at scale.

What to do: Accept this as a natural phenomenon. It’s not preventable. The right response is to focus on long-term, sustainable lift, not on replicating the exact magnitude of the pilot effect.

Double-Down on Content Variety

It is reasonable - and important - not to expect results to double when moving from a 50/50 split to full adoption. The cold start effect, different user histories, seasonality, content limits, and regression to the mean all interact to shape outcomes.

Of these, some factors can’t be changed (seasonality, regression), and others resolve with time (cold start, user history). But one factor is fully in your control: content variety. Among all the factors shaping post-rollout performance, content variety is the single most addressable:

  • Personalization thrives on choice. If ten messages all highlight the same value proposition, agents can only learn about sensitivity to that source of value. But if messages instead span a wide variety of value propositions, the system can uncover entirely new pockets of demand.

  • Coverage creates discovery. Aampe’s strength is not in broadcasting the “average” message but in finding niches that would otherwise go unnoticed. That only happens if content exists to activate those niches.

  • Better matches raise ceilings. As more users enter the system, the performance ceiling depends on whether each user has at least one message that resonates with their individual preferences. Without that, personalization is bottlenecked.


1. Cold Start for Personalization

Aampe is a personalization system. Its agents don’t just apply broad averages—they learn individual behavior. That means each new user entering the system begins in a learning phase. The agent has to test message timing, channels, and content combinations to discover what works for that individual.

During this period, performance is lower because exploration is happening. For most of the users who have been with Aampe since the beginning, that phase is behind them. For the newly added BAU users, it is just beginning.

What to do: Be patient. Results improve as the system learns. Early impressions and transactions aren’t the end state—they’re part of the path to steady-state optimization.

2. Different Starting Points for BAU Users

The users who were kept under BAU have a different history. They were trained, consciously or not, to expect BAU’s cadence, tone, or product focus. For example, they may have been habituated to promotions around a narrow set of product lines. That creates inertia.

When those users are moved to Aampe, the agents have to work harder to overcome these patterns. They may need to redirect attention, rebuild responsiveness, or break old expectations. That takes time, and in some cases the habits may be too ingrained to change. That is one of the costs of doing a split test.

What to do: Again, patience helps. Some users can be “re-trained” over time. Others may not move as far. The key is to understand that a history of BAU interactions makes those users systematically different than the users who were always on Aampe. They start out qualitatively different from anything Aampe agents have seen up to that point, and many of them may remain so.