New on arXiv: validating Aampe performance with adaptive synthetic control groups

May 3, 2023

Schaun Wheeler

When people start using Aampe, they naturally want to understand how the system is performing. And when they ask how the system is performing, they naturally understand that Aampe can’t take credit for every app visit, add to cart, or purchase that happens in close proximity to a message being sent. There’s a baseline level of activity that you can expect on any app - the amount of stuff that will happen if you do no messaging at all, and estimates of the effectiveness of a particular messaging strategy should take that baseline into account.

One common way to validate performance is with a holdout group. There are a variety of reasons why this is not a great idea, perhaps the clearest one being that it creates apples-to-oranges comparisons when you’re adapting your messaging over time. Variations on a holdout group such as a switchback holdout or a synthetic control don’t solve the fundamental problem that, when you’re constantly changing your messaging in reaction to user responses, any holdout has a very limited shelf-life: they work well for single A/B tests that have a specific start date and stop date. They don’t work well for continuous adaptation.

That’s why we developed an adaptive synthetic control group, based on the same theoretical foundation as the Coarsened Exact Matching method developed by Gary King and his colleagues at Harvard. The method works by binning users into categories and then matching users who received a message with users that occupy the same bin but did not receive a message. We designate a monitoring window for each individual message sent, and identify a user who was at a similar level of activity on the app and who received a similar message recently, but did not receive any message that would have a monitoring window that overlaps with the message for which we’re seeking a control.

This allows us to evaluate the performance of our personalization scores (the metrics we use to determine which messaging choices an individual user will respond to).

It also allows us to calculate precise attribution estimates, which we expose in our Composer tool:

You can find more details about the method and our validation techniques in our technical paper here.

‍

See All Posts

Jan 14, 2026

Schaun Wheeler

New on arXiv: validating Aampe performance with adaptive synthetic control groups

Related

Contextual bandits and how learning systems actually behave

Better Than Targeting: Aampe’s Agentic Approach to Learning From Users

Why Results Don’t Simply Double After Merging a 50/50 Aampe vs. Business-as-Usual Split

Why Every Event Matters

How Agentic Learners handle Time-based Cycles without breaking

How Agentic Learners judge success without Fixed Funnels

Goals vs. Signals: What Agents really learn from

Choosing Target Events for Agentic Learning

Layering A/B Testing and Agentic Learning for Better Results

Why Auditing a Single Agent Decision Misses the Point

Learning, Not Guessing: How Aampe Aligns With Every User

Alignment at Scale Beats Attribution at Aggregate

Contextual bandits and how learning systems actually behave

Better Than Targeting: Aampe’s Agentic Approach to Learning From Users

Why Results Don’t Simply Double After Merging a 50/50 Aampe vs. Business-as-Usual Split