GET AAMPE
No items found.
DJ Rich
Contributor

Recommender systems are arguably the highest value category of applied machine learning. Netflix presents entertainment viewers want to watch. Spotify recommends music listeners want to hear. Google retrieves websites users want to visit. Amazon shows products customers want to purchase. The algorithms beneath these decisions create enormous enterprise value because without them, users could not discover the inventory they value most.

At Aampe, we’ve experienced the value of these systems in driving user engagement and sales. One of the best ways to produce a user response to a message is to recommend something they want to interact with. A user who has purchased all of a summer outfit except a hat is quite likely to return if notified of the missing hat. We’ve seen this in our data; as recommendations are included in messaging, purchases increase.

In this post, we will explain how we’ve gotten here. We’ll discuss recommender systems in general, the challenges they present, and the technology we’re developing at Aampe to address them.

Recommender Systems and their Challenges

A recommender system refers to the set of software collectively responsible for exposing preferred items to users. As a whole, it’s responsible for best matching the diversity of users with a company’s inventory. They are called ‘systems’ and not ‘models’ because they must be made of components responsible for separated functions of any digital business, from instrumentation measuring users’ micro actions, to high capacity machine learning models, to the UI where users browse items.

A notorious challenge of recommenders is model evaluation. To differentiate from the encapsulating system, a recommender model is that which recommends a small set of items when given a user and the context of the interaction (e.g. time of year, what else is on screen at the time, etc.). Model evaluation is exceptionally difficult because user behavior, the phenomenon being modeled, depends on the model itself. Which items are interacted with depends on the recommender operating at the time. This makes the data partially irrelevant in evaluating a new and different recommender. Special care must be taken to either correct for the bias created by the existing recommender or run experiments, whose randomization creates an opportunity cost. This issue was highlighted in Herlocker et al. (2004) in their seminal work explaining the unique challenge of recommender evaluation.

Another noteworthy issue is that of scaling. User-items interactions create high volume data. For a fashion retailer, it is on the order of all transactions. For Netflix, it is on the order of all views. Because of this, the techniques and technology must be capable of handling massive data.

Related to the issue of evaluation is the challenge of tuning. Every model comes with a set of knobs that must be adjusted according to the specific problem. Some models have many knobs and tuning is extensive. Other models have fewer knobs and work relatively well out-of-the-box. We highlight tuning because it, as a task, does not scale well. With new data (technically generated with every recommender deployment) or a new model, tuning is required. It must always be paid for in human labor and, if the model’s performance is sensitive to tuning, it is easy to do it wrong, especially given the aforementioned difficulties of evaluation. Further, scale requirements slow iteration speed, reducing the total tuning work that can be done.

The challenges discussed here are by no means exhaustive, but are the ones we’ve specifically addressed in the design of Aampe’s system.

The Aampe Recommender Approach

Aampe’s approach is to include a bandit algorithm that dispatches a user to one of several recommender models. 

A primary motivation for including the bandit algorithm is that it partially addresses the evaluation challenge. Under this scheme, data is never entirely conditional on a single recommender. In fact, there is randomization in the choice of recommenders, making this an occasional A/B/n test. This diversity of data means a new recommender can be better evaluated. Indeed, the data under recommender 2 might create a bias in evaluating a candidate recommender, but that bias is likely to be averaged out across all recommenders. Understanding these biases and how they relate is a promising area for future research, since it improves how well our offline evaluation routine estimates performance in production.

Further, the bandit algorithm is an adaptive agent. If one model is substantially better than the others, it will eventually make more of the recommendations users see. If user preferences drift such that a different model becomes the best, the bandit algorithm will naturally reallocate user recommendations. This is adaptive in the sense that data scientists don’t need to manually re-evaluate performance and deploy the model they decided was best. It is an online, continuous evaluation procedure.

Beyond adaptability over time, it is adaptable across Aampe customers. Aampe customers vary in their inventory and user base. The best recommender model for one customer may not be the best for another. So the agent provides an essential degree of freedom, whereby the best recommender is automatically selected for each customer company.

Engineering is another motivation for the bandit component. Contributing a recommender model is an easy, modular, safe and repeatable procedure. A data scientist may design one independent of the existing recommenders and according to their taste in techniques and evaluation metrics. Following a required offline evaluation, they may add their recommender to the existing set. If the offline evaluation holds up, their model will perform well and receive proportionally more traffic. If, despite the offline evaluation, it performs poorly, it will receive proportionally less traffic. 

Further, contributing a recommender is fault tolerant and increases the fault tolerance of the encapsulating system. If the new recommendation model fails, the other recommenders serve as fall back options. In the same capacity, if any other recommender fails, the new recommender provides one extra layer of fallback redundancy.

Which Recommender Models?

The bandit approach will only be good as the recommender models allow it to be, so it is worth discussing the criteria for adding to the constituent models. At Aampe, we heavily prefer models that scale well and require relatively little tuning.

The scalability of a recommender model is a primary consideration because Aampe must serve a company-customer base that varies from huge data to less huge data. A scalable recommender can be used with all customers whereas a data constrained recommender cannot. To avoid managing too many customer-specific sets of recommenders, we insist that all recommenders scale well. This rules out pure Python approaches but otherwise is not terribly constraining, since most recommenders are designed with scale in mind.

In a similar vein of avoiding unnecessary labor, we prefer recommenders with relatively few knobs to tune. As mentioned, tuning is something that must be manually repeated for every training set and is an opportunity for human error. As Aampe expands customers, it is effectively a resource constraint. Ultimately, this biases Aampe away from the most elaborate deep learning techniques. In fact, we believe this is a safe position due to the conventional skepticism as to whether deep learning techniques are the best option for most cases. See Dacrema et al. (2019) for a replicate study to this effect.

Finally, the bandit design invites contributions from in-house data science teams that use Aampe’s platform. Teams can develop their own recommender model and then compete or parameterize them through Aampe’s system. This is an attractive opportunity for collaboration because customers understand their user base differently than how Aampe does. Their recommender could leverage domain knowledge or seasonality patterns that a domain agnostic one would be slow to recognize. If this modeling approach produced incremental goal events (e.g. purchases), the bandit algorithm could learn to prefer that recommender. More generally, the design supports more expert contributions, which is likely to improve performance over the long term.

Outstanding Challenges

We are encouraged by our experience with this approach, but the issue of objective metric design and evaluation remain top of mind.

Objective metric design refers to what loss function to optimize in the training of any recommender model. Generally, our goal is to increase the likelihood of some goal event, like purchases. However, such things are rare. It appears the use of a surrogate objective function that is less sparsely observed is often better than the rare goal event. How exactly to construct the best surrogate objective is an open question.

On evaluation, we are contemplating the same question considered by most recommender designers; what exactly makes for a good recommender? That is, evaluation is vague. It is generally accepted that desired recommendations encourage the target behavior and are ‘diverse, novel, and serendipitous’. Resolving this balance into a specific metric that creates a positive user experience and returning users is something worthy of further attention. Regarding the Aampe approach, we expect to incorporate this metric into the bandit algorithm’s decision over recommender models.

Conclusion

The Aampe approach addresses several challenges in designing effective recommender systems, from evaluation bias to scalability and tuning. By incorporating a bandit algorithm that dynamically adjusts recommendations based on performance, the system is adaptive, fault tolerant and easy to expand, both with contributions from customers and the Aampe team. While there are still open questions about objective metric design and evaluation, the system’s flexibility and scalability position Aampe to continually improve user experience and drive revenue.

This browser does not support inline PDFs. Download the PDF to view it.
No items found.

Let's discuss recommender systems, the challenges they present, and the technology we’re developing at Aampe to address them.

Adaptive Recommender Systems: A Bandit Approach

Recommender systems are arguably the highest value category of applied machine learning. Netflix presents entertainment viewers want to watch. Spotify recommends music listeners want to hear. Google retrieves websites users want to visit. Amazon shows products customers want to purchase. The algorithms beneath these decisions create enormous enterprise value because without them, users could not discover the inventory they value most.

At Aampe, we’ve experienced the value of these systems in driving user engagement and sales. One of the best ways to produce a user response to a message is to recommend something they want to interact with. A user who has purchased all of a summer outfit except a hat is quite likely to return if notified of the missing hat. We’ve seen this in our data; as recommendations are included in messaging, purchases increase.

In this post, we will explain how we’ve gotten here. We’ll discuss recommender systems in general, the challenges they present, and the technology we’re developing at Aampe to address them.

Recommender Systems and their Challenges

A recommender system refers to the set of software collectively responsible for exposing preferred items to users. As a whole, it’s responsible for best matching the diversity of users with a company’s inventory. They are called ‘systems’ and not ‘models’ because they must be made of components responsible for separated functions of any digital business, from instrumentation measuring users’ micro actions, to high capacity machine learning models, to the UI where users browse items.

A notorious challenge of recommenders is model evaluation. To differentiate from the encapsulating system, a recommender model is that which recommends a small set of items when given a user and the context of the interaction (e.g. time of year, what else is on screen at the time, etc.). Model evaluation is exceptionally difficult because user behavior, the phenomenon being modeled, depends on the model itself. Which items are interacted with depends on the recommender operating at the time. This makes the data partially irrelevant in evaluating a new and different recommender. Special care must be taken to either correct for the bias created by the existing recommender or run experiments, whose randomization creates an opportunity cost. This issue was highlighted in Herlocker et al. (2004) in their seminal work explaining the unique challenge of recommender evaluation.

Another noteworthy issue is that of scaling. User-items interactions create high volume data. For a fashion retailer, it is on the order of all transactions. For Netflix, it is on the order of all views. Because of this, the techniques and technology must be capable of handling massive data.

Related to the issue of evaluation is the challenge of tuning. Every model comes with a set of knobs that must be adjusted according to the specific problem. Some models have many knobs and tuning is extensive. Other models have fewer knobs and work relatively well out-of-the-box. We highlight tuning because it, as a task, does not scale well. With new data (technically generated with every recommender deployment) or a new model, tuning is required. It must always be paid for in human labor and, if the model’s performance is sensitive to tuning, it is easy to do it wrong, especially given the aforementioned difficulties of evaluation. Further, scale requirements slow iteration speed, reducing the total tuning work that can be done.

The challenges discussed here are by no means exhaustive, but are the ones we’ve specifically addressed in the design of Aampe’s system.

The Aampe Recommender Approach

Aampe’s approach is to include a bandit algorithm that dispatches a user to one of several recommender models. 

A primary motivation for including the bandit algorithm is that it partially addresses the evaluation challenge. Under this scheme, data is never entirely conditional on a single recommender. In fact, there is randomization in the choice of recommenders, making this an occasional A/B/n test. This diversity of data means a new recommender can be better evaluated. Indeed, the data under recommender 2 might create a bias in evaluating a candidate recommender, but that bias is likely to be averaged out across all recommenders. Understanding these biases and how they relate is a promising area for future research, since it improves how well our offline evaluation routine estimates performance in production.

Further, the bandit algorithm is an adaptive agent. If one model is substantially better than the others, it will eventually make more of the recommendations users see. If user preferences drift such that a different model becomes the best, the bandit algorithm will naturally reallocate user recommendations. This is adaptive in the sense that data scientists don’t need to manually re-evaluate performance and deploy the model they decided was best. It is an online, continuous evaluation procedure.

Beyond adaptability over time, it is adaptable across Aampe customers. Aampe customers vary in their inventory and user base. The best recommender model for one customer may not be the best for another. So the agent provides an essential degree of freedom, whereby the best recommender is automatically selected for each customer company.

Engineering is another motivation for the bandit component. Contributing a recommender model is an easy, modular, safe and repeatable procedure. A data scientist may design one independent of the existing recommenders and according to their taste in techniques and evaluation metrics. Following a required offline evaluation, they may add their recommender to the existing set. If the offline evaluation holds up, their model will perform well and receive proportionally more traffic. If, despite the offline evaluation, it performs poorly, it will receive proportionally less traffic. 

Further, contributing a recommender is fault tolerant and increases the fault tolerance of the encapsulating system. If the new recommendation model fails, the other recommenders serve as fall back options. In the same capacity, if any other recommender fails, the new recommender provides one extra layer of fallback redundancy.

Which Recommender Models?

The bandit approach will only be good as the recommender models allow it to be, so it is worth discussing the criteria for adding to the constituent models. At Aampe, we heavily prefer models that scale well and require relatively little tuning.

The scalability of a recommender model is a primary consideration because Aampe must serve a company-customer base that varies from huge data to less huge data. A scalable recommender can be used with all customers whereas a data constrained recommender cannot. To avoid managing too many customer-specific sets of recommenders, we insist that all recommenders scale well. This rules out pure Python approaches but otherwise is not terribly constraining, since most recommenders are designed with scale in mind.

In a similar vein of avoiding unnecessary labor, we prefer recommenders with relatively few knobs to tune. As mentioned, tuning is something that must be manually repeated for every training set and is an opportunity for human error. As Aampe expands customers, it is effectively a resource constraint. Ultimately, this biases Aampe away from the most elaborate deep learning techniques. In fact, we believe this is a safe position due to the conventional skepticism as to whether deep learning techniques are the best option for most cases. See Dacrema et al. (2019) for a replicate study to this effect.

Finally, the bandit design invites contributions from in-house data science teams that use Aampe’s platform. Teams can develop their own recommender model and then compete or parameterize them through Aampe’s system. This is an attractive opportunity for collaboration because customers understand their user base differently than how Aampe does. Their recommender could leverage domain knowledge or seasonality patterns that a domain agnostic one would be slow to recognize. If this modeling approach produced incremental goal events (e.g. purchases), the bandit algorithm could learn to prefer that recommender. More generally, the design supports more expert contributions, which is likely to improve performance over the long term.

Outstanding Challenges

We are encouraged by our experience with this approach, but the issue of objective metric design and evaluation remain top of mind.

Objective metric design refers to what loss function to optimize in the training of any recommender model. Generally, our goal is to increase the likelihood of some goal event, like purchases. However, such things are rare. It appears the use of a surrogate objective function that is less sparsely observed is often better than the rare goal event. How exactly to construct the best surrogate objective is an open question.

On evaluation, we are contemplating the same question considered by most recommender designers; what exactly makes for a good recommender? That is, evaluation is vague. It is generally accepted that desired recommendations encourage the target behavior and are ‘diverse, novel, and serendipitous’. Resolving this balance into a specific metric that creates a positive user experience and returning users is something worthy of further attention. Regarding the Aampe approach, we expect to incorporate this metric into the bandit algorithm’s decision over recommender models.

Conclusion

The Aampe approach addresses several challenges in designing effective recommender systems, from evaluation bias to scalability and tuning. By incorporating a bandit algorithm that dynamically adjusts recommendations based on performance, the system is adaptive, fault tolerant and easy to expand, both with contributions from customers and the Aampe team. While there are still open questions about objective metric design and evaluation, the system’s flexibility and scalability position Aampe to continually improve user experience and drive revenue.

This browser does not support inline PDFs. Download the PDF to view it.