May 28, 2025
Schaun Wheeler

Beyond Multi-Armed Bandits: Understanding Aampe's Semantic-Associative Agents

May 28, 2025
Schaun Wheeler

Beyond Multi-Armed Bandits: Understanding Aampe's Semantic-Associative Agents

May 28, 2025
Schaun Wheeler

Beyond Multi-Armed Bandits: Understanding Aampe's Semantic-Associative Agents

May 28, 2025
Schaun Wheeler

Beyond Multi-Armed Bandits: Understanding Aampe's Semantic-Associative Agents

People sometimes ask whether our system is a kind of multi-armed bandit. It’s not. But that’s not a bad place to start if you want a familiar reference point.

Our semantic-associative agents use the same basic intuition: take actions, observe outcomes, and update preferences. But two key differences make this something else entirely:


  1. Multi-dimensional action space

    In a typical bandit problem, the agent chooses from a flat set of discrete actions—pull arm A, B, or C. Each action is assumed to be atomic and independent. Even when bandits are extended into contextual or combinatorial forms, they still often treat each action as a point in a single, unified decision space.

    Real-world decision-making—especially in applications like customer engagement—isn’t like that. You’re not just choosing “an action.” You’re selecting a profile made up of choices across several intersecting dimensions: time of day, day of week, message channel, content theme, offer type, incentive level, tone, subject line, etc. Each of these is its own action set, and the agent must learn how these dimensions interact—both with each other and with user behavior. The task isn’t just to find the best arm, but to learn a combinatorial space of micro-preferences and then select a coherent, deliverable action bundle that fits.


  2. Non-ergodic learning

    Most bandit systems assume some form of ergodicity—the idea that statistical insights gained from one user’s behavior can generalize to another. In an ergodic system, learning can be pooled: we assume that averages across time and averages across the population converge. That makes for efficient learning, especially when individual data is sparse.

    But user behavior in domains like messaging or content interaction is not ergodic. People differ—not just in preferences, but in responsiveness, habits, intent, timing, and attention. Treating these differences as noise and trying to learn a global average flattens signal that actually matters. Our agents treat each user as their own environment. They don’t generalize across users. They build up individualized models based solely on that user’s interaction history, which lets them preserve and act on genuine behavioral variance instead of averaging it away.

So while it’s tempting to think of this as a fancy bandit setup, that framing misses what’s actually happening. It’s not a variant—it’s a structurally different approach. Bandits are a good metaphor to start with, but the differences are architectural, not cosmetic.

And to be clear: none of this depends on LLMs. An LLM is just an actor—it takes context and produces plausible outputs. Our learning agents run upstream of that. They’re responsible for producing the right context in the first place, based on what they’ve learned about how a particular user responds to different combinations of actions. That context can then drive the LLM, or be used to select from a content library indexed to the same profile space.

0

Related

Shaping the future of marketing with Aampe through innovation, data.

Renewals, holidays, and launches don’t need hardcoded rules. With reward signals, eligibility criteria, and timing action sets, agents adapt naturally to recurring patterns.

Renewals, holidays, and launches don’t need hardcoded rules. With reward signals, eligibility criteria, and timing action sets, agents adapt naturally to recurring patterns.

Renewals, holidays, and launches don’t need hardcoded rules. With reward signals, eligibility criteria, and timing action sets, agents adapt naturally to recurring patterns.

Renewals, holidays, and launches don’t need hardcoded rules. With reward signals, eligibility criteria, and timing action sets, agents adapt naturally to recurring patterns.

Aug 21, 2025

Schaun Wheeler

By modeling statistical relationships between events, agents evaluate directional shifts in behavior—so the same system adapts across every lifecycle stage.

Aug 21, 2025

Schaun Wheeler

By modeling statistical relationships between events, agents evaluate directional shifts in behavior—so the same system adapts across every lifecycle stage.

Aug 21, 2025

Schaun Wheeler

By modeling statistical relationships between events, agents evaluate directional shifts in behavior—so the same system adapts across every lifecycle stage.

Aug 21, 2025

Schaun Wheeler

By modeling statistical relationships between events, agents evaluate directional shifts in behavior—so the same system adapts across every lifecycle stage.

Aug 19, 2025

Schaun Wheeler

You don’t coach by chasing the trophy. You coach by tracking whether each play puts you in a stronger position. The same is true for customer engagement.

Aug 19, 2025

Schaun Wheeler

You don’t coach by chasing the trophy. You coach by tracking whether each play puts you in a stronger position. The same is true for customer engagement.

Aug 19, 2025

Schaun Wheeler

You don’t coach by chasing the trophy. You coach by tracking whether each play puts you in a stronger position. The same is true for customer engagement.

Aug 19, 2025

Schaun Wheeler

You don’t coach by chasing the trophy. You coach by tracking whether each play puts you in a stronger position. The same is true for customer engagement.

Jul 23, 2025

Schaun Wheeler

A/B tests help us see what works on average, but real users aren’t average, their motivations and contexts vary. That’s where agentic learning shines, adapting to individuals over time. The best results come when we layer the two: tests for clarity, agents for personalization.

Jul 23, 2025

Schaun Wheeler

A/B tests help us see what works on average, but real users aren’t average, their motivations and contexts vary. That’s where agentic learning shines, adapting to individuals over time. The best results come when we layer the two: tests for clarity, agents for personalization.

Jul 23, 2025

Schaun Wheeler

A/B tests help us see what works on average, but real users aren’t average, their motivations and contexts vary. That’s where agentic learning shines, adapting to individuals over time. The best results come when we layer the two: tests for clarity, agents for personalization.

Jul 23, 2025

Schaun Wheeler

A/B tests help us see what works on average, but real users aren’t average, their motivations and contexts vary. That’s where agentic learning shines, adapting to individuals over time. The best results come when we layer the two: tests for clarity, agents for personalization.

Load More

Load More

Load More

Load More