Evaluating Adaptive Systems: Beyond Short-Term Metrics

Jun 3, 2025

Schaun Wheeler

Business

Data Science

When evaluating an adaptive system like Aampe, the most common question is: "what’s the lift?"

It’s an understandable reflex. Lift is easily measurable. With lift numbers, you can compare system A to system B and say, “This one wins.”

This is the problem with metrics - we tend to confuse what we can measure with what really matters. Just because lift is relatively easy to measure doesn't mean that lift is what we should focus on.

Lift is short-term by design
It tells you what happened immediately after a change. Did this value proposition get more opens? Did this product category get more clicks? But the things that actually matter in human-facing systems - trust, satisfaction, retention, loyalty - don’t show up in the next log entry. They accrue slowly. A system that shows zero short-term lift but treats people better may yield much higher long-term value.
Lift assumes a purely instrumental relationship
It asks: Did I get the user to do the thing? That frames the user as a means to an end, but most people don’t want to feel optimized. Systems that treat people as individuals — with histories, preferences, context — don’t just perform better. They create qualitatively better experiences.
Lift rewards opportunism, not intelligence
You can often get lift by exploiting quirks in behavior: urgency language, timing hacks, selective targeting. That doesn’t mean you or your system understand anything useful about the world. It just means you found a trick. If the goal is robust learning and improved user experience, lift might be the wrong scorecard.
Lift hides behavioral diversity
A system can improve average lift by doing a better job on users who already convert well, while doing nothing (or worse) for others. If you care about broad coverage, individual alignment, or inclusive performance, lift alone won’t tell you how well you’re doing.
Lift is static in a dynamic world
It assumes a fixed contest: “Which model performs better right now?” But adaptive systems evolve. The right question isn’t just who wins today, but rather who keeps improving. Who adapts gracefully to new users? Who scales with minimal retraining? Who accumulates useful structure over time?

So if not lift, then what? There are better ways to ask whether a system is doing good work: