Agentic systems don't operate over a huge, chaotic message space. They operate over structured action sets — defined semantic categories that make up a treatment policy.
At Aampe, a typical treatment policy might be composed from action sets: day of week, time of day, channel, value proposition, product/offering, tone of voice, incentive level/type, and so on.
Each set has maybe 3 to 20 possible actions. So yes, the space is combinatorially large—but it's semantically tractable. The agent's task isn't to predict the perfect message. It's to select the best combination of these features for a given user at a given moment. That policy then determines what content gets sent.
The main complexity of user engagement is in the user space - all of the different events that can happen, either individually or in sequence, in their journey though the app. Agents avoid getting bogged down in that complexity by modeling the action space instead of the user space. The user space gets handled via policy selection rather than direct modeling.
That's not just a design choice. It's the reality of online, real-time, organic human behavior. Every customer's behavior is messy, dynamic, and only partially observable. Some users are effectively cold starts. Others have interacted enough that the agent has built confidence in their preferences.
Agentic learners track this directly: every action is represented as a beta distribution, continuously updated per user. So for any feature — say, "push" vs. "email," or "high discount" vs. "low urgency" — the agent knows
what it expects the reward to be, and
how confident it is in that estimate
When the agent selects a policy, it's balancing expected value and *uncertainty. Some actions are taken because they're strong bets. Others are taken because uncertainty is high and requires exploration.
Because treatments are structured, and every send is tied to a selected policy, we can analyze outcomes in detail:
Which features are being chosen confidently vs. tentatively?
How often exploratory actions outperform exploitative ones?
How does agent learning translate into measurable gains over time?
This is the core advantage of agentic personalization: it doesn't just deliver content. It runs structured experiments, learns revealed preferences, and adapts policy decisions at the feature level for every individual user.
The table below shows the percent difference between explore-mode messages and exploit-mode messages for one of our customers. The columns of the table represent eight different use cases agents were assigned to orchestrate, and each row represents a different app event. The higher the value (and the more blue the cell), the more exploit-mode messages outperformed explore-mode messages.