Relative Slate Bandits for E-com Homepage Picks

Use group-relative contextual bandits to select optimal product slates for e-commerce homepages, leveraging relative quality signals for efficient RL over full prediction models.

Slate Selection Beats Item Picking in Homepage Recs

E-commerce homepages demand choosing one complete product slate (display plan) from candidates like recent browsing matches, promotions, high-margin pushes, or balanced mixes. This single first-screen decision drives clicks, add-to-carts, conversions, margins, and session behavior. Available context—user history, session signals, time/campaign state, business metrics—enables compact modeling without needing full item-level predictions.

Bandit RL Over Pure Prediction

Treating homepage recs as a contextual bandit problem captures sequential decision-making under uncertainty, where slates compete via relative quality (e.g., one outperforms others in A/B tests). Policy gradient methods train efficiently on these group-relative rewards, avoiding expensive full-slate simulation or prediction of every item interaction. This scales to production where candidate slates are pre-generated.

Note: Content truncates early, limiting deeper method details like exact policy gradient formulation.

Summarized by x-ai/grok-4.1-fast via openrouter

3602 input / 907 output tokens in 8330ms

© 2026 Edge