Here is a diagram of our agentic architecture (well, part of it). See the top-right box: "recommender service"? Let’s talk about that. At Aampe, we split copy personalization into two distinct decisions:
Which item to recommend
How to compose the message that delivers it
Each calls for a different approach.
For item recommendations, we use classical recommender systems: collaborative filtering, content-based ranking, etc. These are built to handle high-cardinality action spaces — often tens or hundreds of thousands of items — by leveraging global similarity structures among users and items.
For message personalization, we take a different route. Each user has a dedicated semantic-associative agent that composes messages modularly — choosing tone, value proposition, incentive type, product category, and call to action. These decisions use a variant of Thompson sampling, with beta distributions derived from each user’s response history.
Why split the system this way? Sometimes you want to send content without recommending an item — having two separate processes makes that easier. But there are deeper reasons why recommender systems suit item selection and reinforcement learning suits copy composition:
Cardinality
The item space is vast — trial-and-error is inefficient. Recommenders generalize across users/items. Copy has a smaller, more personal space where direct exploration works well.
Objectives
Item recommendations aim at discovery — surfacing new or long-tail content. Copy is about resonance — hitting the right tone based on past response.
Decision structure
Item selection is often a single decision. Copy is modular — interdependent parts that must cohere. Perfect for RL over structured actions.
Hidden dimensions
Item preferences stem from stable traits like taste or relevance. Copy preferences shift quickly and depend on context — ideal for RL’s recency-weighted learning.
Reward density
Item responses are sparse. Every content delivery yields feedback — dense enough to train RL agents, if interpreted correctly.
In short: recommenders find cross-user/item patterns in large spaces. RL adapts to each user in real time over structured choices. Aampe uses both — each matched to the decision it’s best for.