A common question: how can agentic learners recognize what counts as a “good” outcome when different users are at very different stages of their lifecycle?
The answer is that agents don’t solve this explicitly. They solve it implicitly through the structure of the reward function.
If you define a bottom-of-funnel event like “complete a purchase” as the target, the agent then models statistical relationships between all other events and that target. “Add to cart” shows up as strongly related, “open the app” as weakly related—but both have non-zero weight.
For example, the visualization below maps the statistical relationships between events in a travel app. Each row corresponds to a product vertical — flights, hotels, buses — and within each row the nodes represent funnel stages such as browsing, selecting, booking, and paying. Lines show how activity at one stage and vertical statistically feeds into the likelihood of later stages and other verticals. The point is that agents don’t need to predefine different targets for different users: the network of relationships encodes how signals propagate across the funnel.
Agents learn to interpret shifts in those signals. Directional judgments don’t come from raw counts of objectives. They come from relative shifts in signal.
➡️ If a user was doing very little, then received a message and added to cart, that looks moderately effective.
➡️ If they only opened the app, that looks minimally effective.
➡️ If they were already browsing heavily, then received a message and only opened the app once more, that looks counterproductive.
Because the agentic learner evaluates shifts rather than binary outcomes, it doesn’t need separate logic for “active” vs. “inactive” or "abandoned cart" vs. "just looking" users. The network of statistical relationships between events does the heavy lifting, and the same analytic machinery generalizes across lifecycle stages.
0
Related
Shaping the future of marketing with Aampe through innovation, data.