Relative Slate Bandits for E-com Homepage Picks
Use group-relative contextual bandits to select optimal product slates for e-commerce homepages, leveraging relative quality signals for efficient RL over full prediction models.
Slate Selection Beats Item Picking in Homepage Recs
E-commerce homepages demand choosing one complete product slate (display plan) from candidates like recent browsing matches, promotions, high-margin pushes, or balanced mixes. This single first-screen decision drives clicks, add-to-carts, conversions, margins, and session behavior. Available context—user history, session signals, time/campaign state, business metrics—enables compact modeling without needing full item-level predictions.
Bandit RL Over Pure Prediction
Treating homepage recs as a contextual bandit problem captures sequential decision-making under uncertainty, where slates compete via relative quality (e.g., one outperforms others in A/B tests). Policy gradient methods train efficiently on these group-relative rewards, avoiding expensive full-slate simulation or prediction of every item interaction. This scales to production where candidate slates are pre-generated.
Note: Content truncates early, limiting deeper method details like exact policy gradient formulation.