In a recent post, I wrote that our agents treat each user as their own unique context - they don’t generalize across users. One reader pointed out that zero transfer between users would be impractical for most business contexts, which is entirely correct. My one-context-per-user statement was an accurate portrayal of how semantic-associative agents learn user preferences, but it's not the full picture of how those preferences get deployed to make decisions.
We essentially have three scenarios:
(Practically) full-known users
These are users whose agents have already served them enough treatments to cover most actions in the multi-dimensional action space. The agents learn parameters for each of those actions for each of these users. This is full personalization and is the ideal scenario for an agentic system.
Partially known users
These are users whose agents have learned parameters for some actions, but other actions just haven't been tried yet. In this case, you can pick your imputation procedure of choice (auto-encoders, iterative predictive modeling of individual features, approximate nearest-neighbors, etc.) to fill in the gaps. This is the messy middle-ground reality of most users most of the time.
Unknown users
I said before that we don't use bandits and that is true, but for users who have no data at all - they're brand new, or they've never responded to anything - we can look at the better-known users and come up with a global aggregation of patterns. (Not an average - empirically, we've found that to essentially be the same as imposing a strong flat prior, and I wouldn't recommend it). This is equivalent in principle to a multi-armed bandit, even though the process for arriving at the aggregates is different.
Incidentally, the bandit proxies for unknown users enables use cases that involve anonymous users - you can use an identity proxy (IP address, user agent string, ad id, etc.) and as long as that id sticks around, they'll be treated as a partially-known users, and when the id cycles, they'll be treated as a new user.
So, essentially, in cases where agents lack a history with a particular action, they impute the parameters for that action from users with similar profiles. And when an agent knows absolutely nothing about a user, then it falls back to wisdom of the crowd. In other words, agentic learners do what humans do - they use more precise information when it's available, and fall back to less precise information when necessary to avoid decision paralysis.